Google BigQuery

Google BigQuery is a fully-managed, server-less data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. Read more.

With Hevo Activate, you can configure Google BigQuery as a Data Warehouse to load data to your desired Target CRM application.

Organization of Data in BigQuery

Google BigQuery is hosted on Google’s Cloud Platform (GCP). GCP uses projects to organize resources. An organization can have multiple projects associated with it. Each project has a set of users, a set of APIs, and a billing account. Once you enable your BigQuery API, you can store your data in BigQuery tables. BigQuery organizes the data tables into units called datasets. These datasets are in turn stored in your GCP project.

Note: If your dataset’s location is other than US, the regional or multi-regional Cloud Storage bucket must be in the same region as the dataset. You can read more about this here.

To optimize the SQL queries you run on the BigQuery data warehouse, you can use partitioning or clustering the Source fields. To know when to use clustering over partitioning and vice versa, read this.

Permissions

In order to load data in BigQuery, Hevo needs the following permissions to read and write to BigQuery as well as GCS. These permissions are assigned to the account you use to authenticate Hevo on BigQuery. Refer to Google Account Authentication Methods for more information.

  • View and manage your data in Google BigQuery.

  • Manage your data in Google Cloud Storage.

The files written to GCS are deleted after seven days to free up your storage.

User Authentication

You can connect to the BigQuery data warehouse via User accounts (OAuth).


Prerequisites

  • The authenticating user has BigQuery Admin and Storage Admin roles for the Google BigQuery project. These roles are irrespective of having Owner or Admin roles within the project. Read Google Account Authentication Methods for steps to configure these roles.

  • An active Billing account is associated with your BigQuery project.


Perform the following steps to configure your BigQuery data warehouse:

Retrieve the Project ID

  1. Go to the Google API Console, and in the top left corner, besides the hamburger menu icon, click the drop-down to view all the projects.

    The names and IDs for all the projects you are a member of are displayed.

  2. Copy the ID of the required project.

    List of projects


Get your Dataset ID and Location

  1. Go to your BigQuery instance.

  2. Select the Project ID.

  3. From the list of datasets for the project, choose the one you want the dataset ID and location for.

  4. Copy the Dataset ID and Data Location. The dataset ID is displayed as project-name:dataset-ID. Copy only the dataset ID. For example, in the image above, the dataset ID is test-dataset.

    Select the dataset


Get your GCS Bucket and Bucket Location

  1. Go to Storage in Google Cloud Platform. In the Storage browser, check for the bucket Name and Location.

    GCS bucket name and location

Note: The bucket name has to be unique across all global namespaces within Google Cloud Storage.


Add Billing Details to your Project

Read this document to set up the billing details of your project.


Configure Google BigQuery as a Data Warehouse

Skip this step if you already have a BigQuery data warehouse set up.

You can modify only some of the settings that you provide here once the Destination is created. Refer to section Modifying BigQuery Destination Configuration below for more information.

  1. Do one of the following:

    • Click ACTIVATE in the Asset Palette, and:

      1. In the ACTIVATIONS tab, click + CREATE ACTIVATION.

        Create Activation

      2. In the Select Warehouse page, click ADD WAREHOUSE.

        Add Warehouse

      3. In the Select Warehouse Type page, select Google BigQuery.

    • Click DESTINATIONS in the Asset Palette, and:

      1. Click + CREATE in the Destinations List View.

      2. In the Add Destination page select Google BigQuery as the Destination type.

  2. In the Configure your Google BigQuery Account page,

    1. Click + ADD GOOGLE BIGQUERY ACCOUNT.

    2. Sign in as a user with BigQuery Admin and Storage Admin permissions.

    3. Click Allow to authorize Hevo to access your data.

  3. In the Configure your Google BigQuery Warehouse page, specify the following details:

    Snowflake settings

    • Warehouse Name: A unique name for your data warehouse.

    • Project ID: The Project ID of your BigQuery instance, as retrieved in Step 1 above, for which you have granted permissions in the above steps.

    • Dataset ID: Name of the dataset, as retrieved in Step 2 above, into which you want to sync your data.

    • GCS Bucket: The cloud storage bucket, as retrieved in Step 3 above, where files must be staged before being uploaded to BigQuery.

  4. Click TEST CONNECTION to test and SAVE DESTINATION to complete the setup.


Modifying BigQuery Data Warehouse Configuration

You can modify some of the settings post-creation of the BigQuery data warehouse. However, changing any configuration would affect all the Activations and Pipelines that are using this data warehouse. Refer to the section, [Potential Impacts of Modifying the Configuration Settings] to know the impacts.

To modify the BigQuery Data Warehouse configuration:

  1. Navigate to the Destination Detailed View.

  2. Do one of the following:

    • Click the Settings icon next to the Destination name and then, click the Edit icon.

    • Click the kebab menu on the right and click Edit.

  3. Update the required field values in the Edit Google BigQuery Destination connection settings page.

  4. Click SAVE WAREHOUSE / SAVE DESTINATION, as applicable.

    Modify BQ settings

Potential Impacts of Modifying the Configuration Settings

  • Changing any configuration setting impacts all Activations using the data warehouse.

  • Changing the Project ID or Dataset ID may lead to data getting split between the old and the new Dataset. This is because the data in the old Dataset are not copied to the new one. Further, new tables for all the existing Activations that are using this data warehouse are created in the new Dataset.

  • Changing the Bucket ID may lead to data getting lost. The data not yet copied from the old bucket is not copied to the Dataset.


Limitations

None.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jul-12-2021 1.67 New document.
Last updated on 12 Jul 2021