Recreating Partitioned Tables from BigQuery Console

Last updated on Aug 29, 2023

In BigQuery, you can partition tables only while creating them. BigQuery does not allow adding partition keys to existing tables. Hence, you must recreate an existing table to add a partition key.

Hevo supports creating ingestion time-based and time-unit (date or timestamp) column-based partitioned tables from its user interface. Read Partitioning in BigQuery for the steps to do this.

This page describes the steps to convert an existing non-partitioned table into one partitioned by a time-unit column or an integer range.

From Release 2.16, Hevo supports loading data to such partitioned tables, both new and existing. As a result, Hevo does not need to scan the entire table to identify the impacted rows; instead, it writes data directly to the affected partitions. Thus, the cost of the SQL queries that Hevo runs in your data warehouse for loading data is reduced.


Prerequisites


Recreating Non-Partitioned Tables as Partitioned Tables from the BigQuery Console

Perform the following steps to recreate an existing table and add a partition key:

  1. Pause the Pipeline loading data to the table you want to recreate. This prevents new data from being written to it.

  2. Convert the non-partitioned table to a partitioned table from the BigQuery console. To do this, you must:

    1. Copy your original non-partitioned table to a new one partitioned using a time-unit or integer column from your Source table as the partition key.

    2. Copy the contents from the Description field of each column in your original table to the corresponding column in the partitioned table. This is done to retain the Hevo-generated metadata and primary keys in your new table.

    3. Drop the original non-partitioned table.

    4. Rename the partitioned table to the original table’s name.

  3. Resume the Pipeline to start ingesting data and loading it to your partitioned table.

The following image illustrates this process. The steps are also described in detail in the sections below.

Create Partitioned Tables from BigQuery Console

1. Pause the Pipeline

Note: You can only pause non-webhook Pipelines. In the case of webhook Pipelines, ensure that no new data is pushed to your Pipeline.

  1. Click PIPELINES in the Navigation Bar to open the Pipelines List View.

  2. Search for and select the Pipeline you want to pause.

  3. Click the Pause ( ) icon in the Pipeline Summary bar.

    Pause the Pipeline

    The Pipeline status is changed to PAUSED, and any task associated with the Pipeline is also paused.

2. Convert your non-partitioned table to a partitioned table

Note: Replace the placeholder values in all the commands with your own. For example, replace <original_table_name> with mynonpartitionedtable.

  1. Sign in to the Google Cloud console and go to the BigQuery page.

  2. In the Welcome to your SQL Workspace! page, click + COMPOSE A NEW QUERY to open the query editor.

    BigQuery SQL Workspace

  3. In the query editor, enter one of the following commands to create a table partitioned based on:

    • A Timestamp value:

      CREATE TABLE <project_id>.<dataset_name>.<original_table_name>_copy
      PARTITION BY 
        DATE(<timestamp_column_name>)
      AS 
        SELECT * FROM <project_id>.<dataset_name>.<original_table_name>;
      
    • A Date value:

      CREATE TABLE <project_id>.<dataset_name>.<original_table_name>_copy
      PARTITION BY 
        <date_column_name>
      AS 
        SELECT * FROM <project_id>.<dataset_name>.<original_table_name>;
      
    • A DateTime value:

      CREATE TABLE <project_id>.<dataset_name>.<original_table_name>_copy
      PARTITION BY 
        <datetime_column_name>
      AS 
        SELECT * FROM <project_id>.<dataset_name>.<original_table_name>;
      
    • An Integer range:

      CREATE TABLE <project_id>.<dataset_name>.<original_table_name>_copy
      PARTITION BY 
        RANGE_BUCKET(<integer_column_name>, GENERATE_ARRAY(<start>,<end>,<interval>))
      AS 
        SELECT * FROM <project_id>.<dataset_name>.<original_table_name>;
      
  4. Click RUN to run the SQL statement and create your desired partitioned table type.

    Run SQL statements

  5. Copy the contents from the Description field of the columns in your original table. To do this:

    Note: You can skip to step 6 if the Description fields of all columns in your original table are empty.

    1. In the Explorer pane, expand your project and then the dataset in which both your original and partitioned tables reside.

      BigQuery Explorer Pane

    2. Click the original table’s name to open it in the Details pane.

    3. Log in to the BigQuery console from a new browser window, and repeat steps 1 and 2 above for your partitioned table.

    4. In the respective browser windows, click EDIT SCHEMA to display the Current schema of each table.

      Open Original Table Schema

    5. From the Edit Schema window of the original table, copy the values in the Description field of each column to the Description field of the corresponding columns in your partitioned table.

    6. In the Edit Schema window of the partitioned table, click SAVE to save the changed schema.

      Add Description fields and Save Schema

    7. In the Details pane, verify the Description field values and close the browser window.

    8. In the Edit Schema window of the original table, click CLOSE.

  6. In the Details pane, click the + icon to open the query editor. Enter the following command to drop the original table:

    drop table <project_id>.<dataset_name>.<original_table_name>;
    
  7. Rename your partitioned table to the original table’s name. To do this:

    1. In the Explorer pane, click the partitioned table’s name to open it in the Details pane.

    2. In the Details pane, click COPY.

      Create a Copy

    3. In the Copy Table dialog window, provide your original table’s name in the Table field and click COPY. You have now recreated your original table as a partitioned table.

      Rename the Partitioned Table

  8. Click the + icon in the Details pane to open the query editor. Enter the following command to drop the partitioned table you created in step 3:

    drop table <project_id>.<dataset_name>.<original_table_name_copy>;
    

3. Resume the Pipeline

  1. Click PIPELINES in the Navigation Bar.

  2. From the Pipelines List View, select the Pipeline you had paused earlier.

  3. Click the Resume ( ) icon in the Pipeline Summary bar.

    Resume the Pipeline

    The Pipeline status is changed to ACTIVE, and all tasks associated with the Pipeline are resumed.


Limitations

  • Hevo stores information about the primary keys, which it uses to perform deduplication, in the Description field of a BigQuery table. Hence, to avoid duplicate Events in your data, you must copy the contents of the Description field from your non-partitioned table to the newly created partitioned table.

  • If the partition key is not mapped, Hevo loads NULL values to the partitioning column in the Destination table, and data is placed in the NULL partition. In such a case, to avoid duplicate Events in your Destination table, you must create only one Pipeline per partitioned table.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Aug-28-2023 2.16 New document.

Tell us what went wrong