Google BigQuery

Google BigQuery is a fully-managed, server-less data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. Read BigQuery documentation for more information.

With Hevo Activate, you can configure Google BigQuery as a Data Warehouse to load data to your desired Target CRM application.


Organization of Data in BigQuery

Google BigQuery is hosted on Google’s Cloud Platform (GCP). GCP uses projects to organize resources. An organization can have multiple projects associated with it. Each project has a set of users, a set of APIs, and a billing account. Once you enable your BigQuery API, you can store your data in BigQuery tables. BigQuery organizes the data tables into units called datasets. These datasets are in turn stored in your GCP project.

Note: If your dataset’s location is other than US, the regional or multi-regional Cloud Storage bucket must be in the same region as the dataset. You can read more about this here.

To optimize the SQL queries you run on the BigQuery data warehouse, you can use partitioning or clustering the Source fields. To know when to use clustering over partitioning and vice versa, read this.


Permissions

To load data in into BigQuery, Hevo needs the permission to:

  • View and manage your data in Google BigQuery.

  • Manage your data in Google Cloud Storage (GCS).

These permissions must be assigned to the account you use to authenticate Hevo on BigQuery. Read Google Account Authentication Methods for more information.

Note: The files written to GCS are deleted after seven days to free up your storage.


User Authentication

You can connect to the BigQuery data warehouse via User accounts (OAuth).


Prerequisites

  • The authenticating user has BigQuery Admin and Storage Admin roles for the Google BigQuery project. These roles are irrespective of having Owner or Admin roles within the project. Read Google Account Authentication Methods for steps to configure these roles.

  • An active Billing account is associated with your BigQuery project.


Perform the following steps to configure your BigQuery data warehouse:

Retrieve the Project ID

  1. Go to the Google API Console, and in the top left corner, besides the hamburger menu icon, click the drop-down to view all the projects.

    The names and IDs for all the projects you are a member of are displayed.

  2. Copy the ID of the required project.

    List of projects


Get your Dataset ID and Location

  1. Go to your BigQuery instance.

  2. Select the Project ID.

  3. From the list of datasets for the project, choose the one you want the dataset ID and location for.

  4. Copy the Dataset ID and Data Location. The dataset ID is displayed as project-name:dataset-ID. Copy only the dataset ID. For example, in the image shown below, the dataset ID is test-dataset.

    Select the dataset


Get your GCS Bucket and Bucket Location

Go to Storage in Google Cloud Platform. In the Storage browser, check for the bucket Name and Location.

GCS bucket name and location

Note: The bucket name has to be unique across all global namespaces within Google Cloud Storage.


Add Billing Details to your Project

Read Enable billing for a project to set up the billing details of your project.


Configure Google BigQuery as a Data Warehouse

Skip this step if you already have a BigQuery data warehouse set up.

You can modify only some of the settings that you provide here once the Destination is created. Refer to the section, Modifying BigQuery Destination Configuration below for more information.

  1. Do one of the following:

    • Click ACTIVATE in the Asset Palette, and:

      1. In the ACTIVATIONS tab, click + CREATE ACTIVATION.

        Create Activation

      2. In the Select Warehouse page, click ADD WAREHOUSE.

        Add Warehouse

      3. In the Select Warehouse Type page, select Google BigQuery.

    • Click DESTINATIONS in the Asset Palette, and:

      1. Click + CREATE in the Destinations List View.

      2. In the Add Destination page select Google BigQuery as the Destination type.

  2. In the Configure your Google BigQuery Account page,

    1. Click + ADD GOOGLE BIGQUERY ACCOUNT.

    2. Sign in as a user with BigQuery Admin and Storage Admin permissions.

    3. Click Allow to authorize Hevo to access your data.

  3. In the Configure your Google BigQuery Warehouse page, specify the following details:

    Google BigQuery Warehouse settings

    • Warehouse Name: A unique name for your data warehouse.

    • Project ID: The Project ID of your BigQuery instance, as retrieved in Step 1 above, for which you have granted permissions in the above steps.

    • Dataset ID: Name of the dataset, as retrieved in Step 2 above, into which you want to synchronize your data.

    • GCS Bucket: The cloud storage bucket, as retrieved in Step 3 above, where files must be staged before being uploaded to BigQuery.

  4. Click TEST CONNECTION to test and SAVE DESTINATION or SAVE WAREHOUSE, as applicable, to complete the setup.

Any user with BigQuery admin and Storage admin permissions on the configured data warehouse can modify, create, and run Activations. Such a user will have access to the hevo dataset maintained by Activate for Bookkeeping.


Modifying BigQuery Data Warehouse Configuration

You can modify some of the settings post-creation of the BigQuery data warehouse. However, changing any configuration would affect all the Activations and Pipelines that are using this data warehouse. Refer to the section, Potential Impacts of Modifying the Configuration Settings to know the impacts.

To modify the BigQuery Data Warehouse configuration:

  1. Navigate to the Destination Detailed View.

  2. Do one of the following:

    • Click the Settings icon next to the Destination name and then, click the Edit icon.

    • Click the kebab menu on the right and click Edit.

  3. Update the required field values in the Edit Google BigQuery Destination connection settings page.

  4. Click SAVE WAREHOUSE or SAVE DESTINATION, as applicable.

    Modify BQ settings

Potential Impacts of Modifying the Configuration Settings

  • Changing any configuration setting impacts all Activations using the data warehouse.

  • Changing the Project ID or Dataset ID may lead to data getting split between the old and the new Dataset. This is because the data in the old Dataset are not copied to the new one. Further, new tables for all the existing Activations that are using this data warehouse are created in the new Dataset.

  • Changing the Bucket ID may lead to data getting lost. The data not yet copied from the old bucket is not copied to the Dataset.


Limitations

None.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-09-2021 1.71 Added a note at end of the section, Configure Google BigQuery as a Data Warehouse about permissions required for executing and modifying Activations.
Jul-12-2021 1.67 New document.
Last updated on 12 Oct 2021