Google BigQuery

Google BigQuery is Google’s enterprise data warehouse which is columnar in nature. You can run SQL queries to find meaningful insights through its web UI or any other command-line tool. You can find this article helpful to get started with your BigQuery account.

This article will focus on how to connect Google BigQuery with Hevo as a Destination.

Prerequisites

Hevo requires a user you want it to use to connect BigQuery. The user must have role BigQuery Admin and Storage Admin on the existing project. These roles are irrespective of the user having the Owner or other Admin roles.

Setup Guide

Here, you’ll be creating a Destination by granting Hevo to access the BigQuery project.

  1. Select Destinations in the left pane under Admin.
  2. Click Add Destination and select Destination Type as Google BigQuery.
  3. You’ll be asked to add new BigQuery account. Click on Add New Account.
  4. If you aren’t signed into your Google account, you’ll be prompted to Sign in as the same user you granted BigQuery and Storage Admin permissions to in the previous step.
  5. After you sign in, you’ll see a list of the permissions requested by Hevo:
    1. View and manage your data in Google BigQuery.
    2. Insert data into Google BigQuery.
    3. Manage(Read and Write access) your data in Google Cloud Storage.
  6. To authorize, click Allow button.
  7. If you authorize, you’ll be redirected back to Hevo. Click on Continue button to fill in the fields:
    1. Destination Name: A unique name for this Destination
    2. Project ID: Project ID of your BigQuery instance, for which you’ve granted permissions in the above steps.
    3. Dataset Name: Name of the dataset in which you want to sync your data.
    4. GCS Bucket: Bucket where files will be staged before being uploaded to BigQuery.
    5. Load Interval: Interval at which the data will be loaded into BigQuery.
  8. Configure Advance Settings:
    • Populate Loaded Timestamp: Enable this toggle option to append the __hevo_loaded_at column to the Destination table to indicate the time when the Event was loaded to the Destination. See Loading Data to a Data Warehouse for more information.
  9. Click Test Connection to test and Save Destination to finish this setup.

Handling of JSON Fields

This document describes how JSON data is parsed and made available at Destinations. Because BigQuery has built-in support for nested and repeated columns, JSON data is neither split nor compressed but passed as is to the Destination for all the pipelines with BigQuery as a Destination, created from now on.

The fields of a JSON object would be shown as a struct (for RECORD) or an array (mode: REPEATED) in the schema mapper section. The nested fields are not editable.

Limitations

  • The Load Interval is independent of the schedule of the Pipeline. Hevo writes data to the GCS Bucket before loading data to BigQuery. Keeping the quotas and limits on load jobs enforced by the BigQuery in consideration Hevo waits for an interval of 30 minutes by default before it loads the data to the BigQuery.
  • If your dataset’s location is set to a value other than US, the regional or multi-regional Cloud Storage bucket must be in the same region as the dataset. You can read more about this here.

How it works

You can check this documentation to know how loading data from Cloud Storage works.


See Also

Last updated on 24 Aug 2020