Google BigQuery

Google BigQuery is a fully-managed, server-less data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. Read BigQuery documentation for more information.

With Hevo Pipelines, you can configure Google BigQuery as a Destination for loading data from any Source system.

Note: For a select range of Sources, Hevo provides a pre-configured and fully managed Google BigQuery data warehouse Destination. Read Hevo Managed Google BigQuery.

Organization of data in BigQuery

Google BigQuery is hosted on Google’s Cloud Platform (GCP). GCP uses projects to organize resources. An organization can have multiple projects associated with it. Each project has a set of users, a set of APIs, and a billing account. Once you enable your BigQuery API, you can store your data in BigQuery tables. BigQuery organizes the data tables into units called datasets. These datasets are in turn stored in your GCP project.

Each dataset has a location associated with it. The Pipeline loads the Source data every five minutes by default to your Google Cloud Storage (GCS) buckets. GCS is the storage service offered by GCP. From the GCS buckets, the data is loaded to BigQuery in batches using BigQuery’s Batch Load feature. For this to happen successfully, the location of the GCS buckets must be the same as the dataset location.

Note: If your dataset’s location is other than US, the regional or multi-regional Cloud Storage bucket must be in the same region as the dataset. You can read more about this here.

To optimize the SQL queries you run on the BigQuery Destination, you can use partitioning or clustering the Source fields. To know when to use clustering over partitioning and vice versa, read this.

Permissions

In order to load data in BigQuery, Hevo needs permissions to read and write to BigQuery as well as GCS. The files written to GCS are deleted after seven days to free up your storage. These permissions are assigned to the account you use to authenticate Hevo on BigQuery. Read Google Account Authentication Methods for more information.

User Authentication

You can connect to the BigQuery Destination via User accounts (OAuth) or Service accounts. One service account can be mapped to your entire team. Read Google Account Authentication Methods to know how to set up a service account if you do not already have one.


Prerequisites

  • The authenticating user has BigQuery Admin and Storage Admin roles for the Google BigQuery project. These roles are irrespective of having Owner or Admin roles within the project. Read Google Account Authentication Methods for steps to configure these roles.

  • An active Billing account is associated with your BigQuery project.


Perform the following steps to configure your BigQuery Destination:

Add Billing Details to your Project

Read this document to set up the billing details of your project.


Configure Google BigQuery as a Destination

You can modify only some of the settings that you provide here once the Destination is created. Refer to section Modifying BigQuery Destination Configuration.

  1. Click DESTINATIONS in the Asset Palette.

  2. Click + CREATE in the Destinations List View.

  3. In Add Destination page select Google BigQuery as the Destination type.

  4. In the Configure your Google BigQuery Account page, select the authentication method for connecting to BigQuery.

    Google account authentication methods

  5. Do one of the following:

    • To connect using a Service Account:

      1. Attach the Service Account Key file.

        Note: Hevo supports only JSON format for the key file.

      2. Click CONFIGURE GOOGLE BIGQUERY ACCOUNT.

    • To connect using a User Account:

      1. Click + ADD GOOGLE BIGQUERY ACCOUNT.

      2. Sign in as a user with BigQuery Admin and Storage Admin permissions.

      3. Click Allow to authorize Hevo to access your data.

        Authorize Hevo

  6. In the Configure your Google BigQuery Warehouse page, specify the following details:

    BigQuery Settings

    • Destination Name: A unique name for your Destination.

    • Project ID: Select the Project ID of your BigQuery instance.

    • Dataset:

      You can allow Hevo to create a dataset for you, or manually choose from the list of available datasets for your project ID.

      • Automatically Create a Dataset:

        Auto Create Dataset

        If enabled, Hevo checks if you have the required permissions to create a dataset.

        On a successful check, Hevo automatically creates a dataset with the name hevo_dataset_<Project_ID>. For example, let us suppose that you have a Project with its ID as westeros-153019. Then, Hevo creates a dataset with the name hevo_dataset_westeros_153019.

        On an unsuccessful check, Hevo warns you about the insufficient permissions. At that time, you must grant the required permissions from the GCP console. Read Assigning roles to a user account or Assigning roles to a service account for steps to do this. After granting the permissions, click Check Again on the Hevo UI.

        Auto Create Dataset Disabled

        If you disable the Automatically create a Dataset option, you can select your desired dataset from the Dataset drop-down or choose to create one by clicking + Create “<Dataset_Name>” as a new Dataset. For example, let us suppose that you are searching for a dataset named test_dataset, and you cannot find it in the drop-down because it is not created yet, then, you can click on + Create “test_dataset” as a new Dataset. Hevo creates the dataset for you.

    • GCS Bucket:

      You can allow Hevo to create a bucket for you, or manually choose from the list of available buckets for your project ID.

      • Automatically create GCS Bucket::

        Auto create bucket

        If enabled, Hevo checks if you have the required permissions to create a GCS bucket.

        On a successful check, Hevo automatically creates a bucket with the name hevo_bucket_<Project_ID>. For example, let us suppose that you have Project ID as westeros-153019. Then, Hevo creates a bucket with the name hevo_bucket_westeros_153019.

        On an unsuccessful check, Hevo warns you about the insufficient permissions. At that time, you must grant the required permissions from the GCP Console. Read Assigning roles to a user account or Assigning roles to a service account for steps to do this. After granting the permissions, click Check Again on the Hevo UI.

        Auto create bucket disabled

        If you disable the Automatically create a GCS bucket option, you can select your desired bucket from the GCS Bucket drop-down or choose to create one by clicking on + Create “<Bucket_Name>” as a new Bucket. For example, let us suppose that you are searching for a bucket named test_bucket, and you cannot find it in the drop-down because it is not created yet, then, you can click on + Create “test_bucket” as a new GCS Bucket. Hevo creates the bucket for you.

    • Advanced Settings:

      • Populate Loaded Timestamp: Enable this option to append the ___hevo_loaded_at_ column to the Destination table to indicate the time when the Event was loaded to the Destination.

      • Sanitize Table/Column Names: Enable this option to remove all non-alphanumeric characters and spaces in between the table and column names, and replace them with an underscore (_). Read Name Sanitization.

        Note: This setting cannot be modified later.

      • Enable Streaming Inserts: Enable this option to stream data to your BigQuery Destination as it gets received from the Source instead of loading it as per a defined Pipeline schedule. Read Near Real-time Data Loading using Streaming.

        Note: This setting cannot be modified later.

    Read Loading Data to a Data Warehouse.

  7. Click TEST CONNECTION.

  8. Click SAVE & CONTINUE.


Handling Source Data with Different Data Types

For teams created in or after Hevo Release 1.56, Hevo automatically modifies the data type of Google BigQuery table columns to accommodate Source data with different data types. Data type promotion is performed on tables whose size is less than 50 GB. Read Handling Different Data Types in Source Data.

Note: Your Hevo release version is mentioned at the bottom of the Asset Palette.


Handling JSON Fields

Read Parsing Nested JSON Fields in Events to know how Hevo parses the JSON data and makes it available at the Destination. Because BigQuery has built-in support for nested and repeated columns, JSON data is neither split nor compressed, but based on the Source type, arrays may be collapsed into JSON strings or passed as-is to the Destination. Read JSON functions in Standard SQL to know about parsing JSON strings back to arrays.

The fields of a JSON object are shown as a struct (for RECORD) or an array (mode: REPEATED) in the Schema Mapper. The nested fields are not editable.


Modifying BigQuery Destination Configuration

You can modify some of the settings post creation of the BigQuery Destination. However, changing any configuration would affect all the Pipelines that are using this Destination. Refer to section, Potential Impacts of Modifying the Configuration Settings to know the impacts.

To modify the BigQuery Destination configuration:

  1. Navigate to the Destination Detailed View.

  2. Do one of the following:

    • Click the Settings icon next to the Destination name and then, click the Edit icon.

      Modify BQ settings

    • Click the kebab menu on the right and click Edit.

  3. Update the required field values in the Edit Google BigQuery Destination connection settings page. You can modify the following settings:

    • Destination name

    • Project ID

    • Dataset ID

    • GCS Bucket

    • Populate Loaded Timestamp
      Read the following section to know the impact from making these changes.

  4. Click SAVE DESTINATION.

Potential impacts of modifying the configuration settings

  • Changing any configuration setting impacts all Pipelines using the Destination.

  • Changing the Project ID or Dataset ID may lead to data getting split between the old and the new Dataset. This is because the Events in the old Dataset are not copied to the new one. Further, new tables for all the existing Pipelines that are using this Destination are created in the new Dataset.

  • Changing the Bucket ID may lead to Events getting lost. The Events not yet copied from the old bucket are not copied to the Dataset.


Using Google Account Authentication

You can use two types of Google accounts in Hevo to connect to services hosted on the Google BigQuery. You must provide the following roles to both user and service accounts for accessing the Google Cloud Storage resources for it:

The Storage Admin role, for accessing the GCS resources for the required project of the GCP-hosted service.

The appropriate role for connecting to the GCP-hosted service. For example, to connect to BigQuery APIs for loading data to the BigQuery Destination, you need the BigQuery Admin role.

Refer to the sections below for steps to assign these roles.

Using User Accounts

User accounts are individual Google accounts that you can configure for each team member to connect to the GCP-hosted services such as Google BigQuery and G-Suite applications such as Google Sheets.

Creating a user account

Perform the following steps to create a Google user account:

  1. Log in to your Google Cloud Platform with an Owner Role.

    Click here and check the role against your name. If you are not an Owner, contact your account Owner to perform the following steps.

  2. In the left navigation pane, click IAM & Admin.

  3. Click IAM.

    Select IAM

  4. Select the Name of the project you want to grant roles to.

    Select the project

  5. Click + ADD in the IAM & Admin page.

    Edit permissions

  6. Add the following details of the new user:

    • Member: The Email ID of the user.

    • Name: The full name of the user

    • Role: Select the required role for the user.

Assigning roles to a user account

As a user on GCP with an Owner role, you can assign roles to user accounts for connecting to and accessing data for respective GCP-hosted services. Roles are assigned on specific projects where the data is stored or written.

Perform the following steps to assign the required roles to the Google user account:

  1. In the IAM & Admin, IAM page:

  2. Click the PERMISSIONS tab.

  3. Click the Edit icon for the user to whom the roles are to be assigned.

    Edit permissions

  4. In the Select a role drop-down, select Cloud Storage in the left pane and then select the Storage Admin role. This role is required for accessing the GCS resources for the project.

    Storage Admin permission

  5. Click SAVE and then, + ADD ANOTHER ROLE.

  6. In the Select a role drop-down, select the role applicable for the GCS resource. For example, for connecting to BigQuery APIs, select BigQuery in the left pane and then select BigQueryAdmin.

    Assign BigQuery Admin role

  7. Click SAVE.


Using Google Service Accounts

As a user with Owner role, you can create service accounts for your team members to access the GCP-hosted services such as Google BigQuery and Google Suite applications such as Google Sheets. Once you have done this, you must download the Key file in JSON format, which is used to authenticate you on the Source.

Creating a Google service account

If you have already created a service account for your team, skip to section Assigning Roles to the Service Account below to assign the required roles to it. Else, perform the following steps to create a Google service account:

  1. Log in to your Google Cloud Platform with an Owner Role.

    Click here and check your role against your name. If you are not an Owner, contact your account Owner to perform the following steps.

  2. In the left navigation menu, click IAM & Admin.

  3. Under IAM & Admin, click Service Accounts.

    Click Service Accounts

  4. Select the project in the drop-down at the top of the page.

  5. Click + CREATE SERVICE ACCOUNT.

  6. In the Service account details page, enter a Service account name and description and click CREATE.

    The Service account ID is automatically created based on the service account name.

    Service account properties

Assigning roles to a Google service account

For GCP-hosted services, you must assign the appropriate roles to the service account to enable access to the service and its resources on GCP.

Perform the following steps to assign roles to a service account:

  • For an existing service account:

    1. In your GCP console, under IAM & Admin, click IAM.

    2. In the PERMISSIONS tab, check the Role column to see the assigned roles.

      View assigned roles

    3. If the required roles are not assigned, click the Edit icon for the service account.

    4. In the Edit permissions dialog that opens up, click + ADD ANOTHER ROLE to add a new row.

      Edit account permissions

    5. Select the required role from the Role drop down.

    6. Click SAVE.

  • For a newly created account:

    1. After you have created the service account, continue to the Grant this service account access to project section.

      Grant access to project

    2. Select the role for the related GCP-hosted service. For example, to connect to BigQuery, select BigQuery Admin in the Role drop-down.

    3. Click + ADD ANOTHER ROLE to add a row.

    4. Select Storage Admin in the Role drop-down. This role is required for accessing the GCS resources for the project.

    5. Click CONTINUE.

    6. Click DONE.

Enabling API access for a Google service account

For Google Suite applications, you must enable the respective API to enable the service account to read your data.

To do this:

  1. Access your Google Cloud Platform console.

  2. In the left navigation pane, click APIs & Services and then, Library.

    API Library

  3. In the Library page, search for the API using the search box and click on it.

    Search Sheets API

  4. Click ENABLE to enable the API.

    Enable API

Downloading the key file

Service account credentials are usually stored as a Key file in JSON format. You need the key file to connect to the BigQuery Destination in Hevo.

To download the file:

  1. Log in to the Google Cloud Platform as an Owner.

  2. Navigate to the IAM, Service Accounts page.

  3. Click on the newly created service account

  4. Navigate to the KEYS tab.

  5. Click ADD KEY and then, Create new key.

  6. Select the Key type as JSON and click CREATE. This downloads the key file.

    Note: Hevo supports only JSON format for the key file.

    JSON Key file

You can use this key file to set up the Google BigQuery Destination in Hevo.


Limitations



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Aug-24-2022 NA Updated section, Limitations to add the limitation regarding Hevo not mapping arrays of type struct in the Source to arrays of type struct in the BigQuery Destination.
Apr-27-2022 NA - Removed sections, Retrieve the Project ID, Get your Dataset ID and Location, and Get your GCS Bucket and Bucket Location.
- Updated section, Configure Google BigQuery as a Destination.
Apr-11-2022 NA Updated screenshot in the Configure Google BigQuery as a Destination section to reflect the latest UI.
Aug-09-2021 1.69 Updated the section, Handling JSON Fields.
May-19-2021 NA Added the section Modifying BigQuery Destination Configuration.
Apr-20-2021 1.61 - Updated the page overview section and added sub-section User Authentication.
- Updated section Configure Google BigQuery as a Destination to include the option to connect to BigQuery using service accounts.
- Moved the section Assign BigQuery Admin and Storage Admin Roles to the User to Google Account Authentication Methods.
Apr-06-2021 1.60 Updated the overview to inform customers when to use clustering over partitioning and vice versa.
Feb-22-2021 1.57 Added the Enable Streaming Writes setting in Step 7 of section Configure Google BigQuery as a Destination.
Feb-11-2021 NA Updated the page overview to inform customers that the files written to GCS are deleted after seven days to free up their Google Cloud storage.
Last updated on 24 Aug 2022

Tell us what went wrong