Google BigQuery

Last updated on Sep 03, 2024

Starting Release 2.20, Hevo has stopped supporting the streaming inserts feature in Google BigQuery Destinations. As a result, you will not be able to configure a new Google BigQuery Destination with streaming inserts enabled. Any existing Destinations created with this feature will continue to load data in near real-time. Additionally, you can create new Pipelines to load data to your existing Destination.

Google BigQuery is a fully-managed, serverless data warehouse that supports Structured Query Language (SQL) to derive meaningful insights from the data. With Hevo Pipelines, you can ingest data from any Source system and load it to a Google BigQuery Destination.

Hevo Pipelines stage the Source data in Google Cloud Storage (GCS) buckets before loading it to your BigQuery Destination. Hevo can transfer data to the GCS buckets only if they are in the same location as the dataset, except when the datasets are in the US multi-region. Read Location considerations for further details about the requirements to transfer data from GCS buckets to BigQuery tables. Buckets are basic units of storage in GCS, and they hold your data until the time Hevo loads it into your BigQuery tables.

Note: For a select range of Sources, Hevo provides a pre-configured and fully managed Google BigQuery data warehouse Destination. Read Hevo Managed Google BigQuery.

Roles and permissions

Hevo requires that the following predefined roles are assigned to the connecting Google account for the configured GCP project, to enable Hevo to read and write data from and to BigQuery and the GCS buckets.

Roles Description
BigQuery  
BigQuery Data Editor Grants Hevo permission to:

- Create new datasets in the project.

- Read a dataset’s metadata and list the tables in the dataset.

- Create, update, get, and delete tables and views from a dataset.

- Read and update data and metadata for the tables and views in a dataset.
BigQuery Job User Grants Hevo permission to perform actions such as loading data, exporting data, copying data, and querying data in a project.
Google Cloud Storage  
Storage Admin Grants Hevo permission to list, create, view, and delete:

- Objects in GCS buckets.

- GCS buckets.

If you do not want to assign the Storage Admin role, you can create a custom role at the project or organization level, add the following permissions, and assign this role to the connecting Google account, to grant Hevo access to GCS resources:

  • resourcemanager.projects.get

  • storage.buckets.create

  • storage.buckets.delete

  • storage.buckets.get

  • storage.buckets.list

  • storage.buckets.update

  • storage.multipartUploads.abort

  • storage.multipartUploads.create

  • storage.multipartUploads.list

  • storage.multipartUploads.listParts

  • storage.objects.create

  • storage.objects.delete

  • storage.objects.get

  • storage.objects.list

  • storage.objects.update

Note: Hevo uses its GCS bucket if the connecting Google account is not assigned the Storage Admin role or a custom role.

Google account authentication

You can connect to your BigQuery Destination with a user account (OAuth) or a service account. One service account can be mapped to your entire team. You must provide the required roles to both types of accounts for accessing GCP-hosted services such as BigQuery and the GCS buckets. Read Authentication for GCP-hosted Services to know how to assign these roles to your user account and service account.

The following image illustrates the key steps that you need to complete to configure Google BigQuery as a Destination in Hevo:

BigQuery Destination Setup


Handling Source Data with Different Data Types

For teams created in or after Hevo Release 1.56, Hevo automatically modifies the data type of Google BigQuery table columns to accommodate Source data with different data types. Data type promotion is performed on tables whose size is less than 50 GB. Read Handling Different Data Types in Source Data.

Note: Your Hevo release version is mentioned at the bottom of the Navigation Bar.


Handling JSON Fields

Read Parsing Nested JSON Fields in Events to know how Hevo parses the JSON data and makes it available at the Destination. Because BigQuery has built-in support for nested and repeated columns, JSON data is neither split nor compressed, but based on the Source type, arrays may be collapsed into JSON strings or passed as-is to the Destination. Read JSON functions in Standard SQL to know about parsing JSON strings back to arrays.

The fields of a JSON object are shown as a struct (for RECORD) or an array (mode: REPEATED) in the Schema Mapper. The nested fields are not editable.


Modifying BigQuery Destination Configuration

You can modify some of the settings post creation of the BigQuery Destination. However, changing any configuration would affect all the Pipelines that are using this Destination. Refer to section, Potential Impacts of Modifying the Configuration Settings to know the impacts.

To modify the BigQuery Destination configuration:

  1. Navigate to the Destination Detailed View.

  2. Do one of the following:

    • Click the Settings icon next to the Destination name and then, click the Edit icon.

      Modify BQ settings

    • Click the kebab menu on the right and click Edit.

  3. Update the required field values in the Edit Google BigQuery Destination connection settings page. You can modify the following settings:

    • Destination name

    • Project ID

    • Dataset ID

    • GCS Bucket

    • Populate Loaded Timestamp Read the following section to know the impact from making these changes.

  4. Click SAVE DESTINATION.

Potential impacts of modifying the configuration settings

  • Changing any configuration setting impacts all Pipelines using the Destination.

  • Changing the Project ID or Dataset ID may lead to data getting split between the old and the new Dataset. This is because the Events in the old Dataset are not copied to the new one. Further, new tables for all the existing Pipelines that are using this Destination are created in the new Dataset.

  • Changing the Bucket ID may lead to Events getting lost. The Events not yet copied from the old bucket are not copied to the Dataset.


Destination Considerations

  • Hevo supports loading Events in Geometry formats (such as Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). However, if you are loading any latitude values to your BigQuery Destination, these values must lie within -90 and +90, as that is the range that BigQuery supports. If the data lies outside this range, you can either change the data type in the Source or create a Transformation to change the data type to varchar.

Limitations

  • Hevo currently does not map arrays of type struct in the Source data to arrays of type struct in the BigQuery Destination. Instead, any struct arrays in the Source are mapped as varchar arrays. If needed, you can parse these varchar arrays in the Destination to change them back to struct. Read Strategy: Replicate JSON structure as is while collapsing arrays into strings.

  • Hevo replicates a maximum of 4096 columns to each BigQuery table, of which six are Hevo-reserved metadata columns used during data replication. Therefore, your Pipeline can replicate up to 4090 (4096-6) columns for each table. Read Limits on the Number of Destination Columns.

    To circumvent this limitation, you can use Transformations to remove the columns you do not need.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Aug-13-2024 NA Updated sections, Set up the Google Cloud Platform (Optional) and Configure Google BigQuery as a Destination as per the latest Google BigQuery and Hevo UI.
Feb-05-2023 2.20 - Added a warning container at the top of the page to mention about removing support for streaming inserts.
- Updated section, Configure Google BigQuery as a Destination to remove information about streaming inserts.
Nov-06-2023 NA Merged the content of subsections, Organization of data in BigQuery and Data partitioning and clustering to Structure of Data in the Google BigQuery Data Warehouse page.
Sep-21-2023 NA Added section, Destination Considerations.
Aug-11-2023 NA Added limitation about the number of columns supported by Hevo in BigQuery tables.
Jul-25-2023 NA Added the process flow diagram in the page overview section.
Jun-19-2023 NA Updated sections, Create a Google Cloud project and Configure Google BigQuery as a Destination to add information about enabling billing on the project to use streaming inserts.
Mar-09-2023 NA Updated section, Configuring Google BigQuery as a Destination to add a note about switching your authentication method post-Pipeline creation.
Dec-20-2022 NA - Added content in the Roles and Permissions section to explain the permissions needed by Hevo.
- Updated the Prerequisites section to remove BigQuery Admin and Storage Admin roles requirement.
Nov-08-2022 2.01 Updated screenshot in the Configure Google BigQuery as a Destination section to reflect the latest UI.
Oct-17-2022 NA - Added the Set up the Google Cloud Platform section to help set up GCP.
- Modified the Prerequisites to add the requirement of an existing GCP project.
Aug-24-2022 NA Updated section, Limitations to add the limitation regarding Hevo not mapping arrays of type struct in the Source to arrays of type struct in the BigQuery Destination.
Apr-27-2022 NA - Removed sections, Retrieve the Project ID, Get your Dataset ID and Location, and Get your GCS Bucket and Bucket Location.
- Updated section, Configure Google BigQuery as a Destination.
Apr-11-2022 NA Updated screenshot in the Configure Google BigQuery as a Destination section to reflect the latest UI.
Aug-09-2021 1.69 Updated the section, Handling JSON Fields.
May-19-2021 NA Added the section Modifying BigQuery Destination Configuration.
Apr-20-2021 1.61 - Updated the page overview section and added sub-section User Authentication.
- Updated section Configure Google BigQuery as a Destination to include the option to connect to BigQuery using service accounts.
- Moved the section Assign BigQuery Admin and Storage Admin Roles to the User to Google Account Authentication Methods.
Apr-06-2021 1.60 Updated the overview to inform customers when to use clustering over partitioning and vice versa.
Feb-22-2021 1.57 Added the Enable Streaming Writes setting in Step 7 of section Configure Google BigQuery as a Destination.
Feb-11-2021 NA Updated the page overview to inform customers that the files written to GCS are deleted after seven days to free up their Google Cloud storage.

Tell us what went wrong