Partitioning in BigQuery

Google BigQuery is truly a serverless way to handle massive datasets and a partitioned table is exactly the right way to handle that amount of data. It is a table that is divided into segments to make it easier to manage and query data. Partitions can improve query performance, and control costs by reducing the number of bytes read by a query.

There are two types of table partitioning in BigQuery:

  • Ingestion-Time partitioned tables: Tables partitioned based on the data’s ingestion (load) date or arrival date.

  • Field-based partitioned table: Tables that are partitioned based on the timestamp/date column.

Ingestion-time partitioned tables

When you create a table partitioned by ingestion time, it automatically loads data into date-based partitions that denotes the date when the data arrived.

The tables have a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data that is loaded into the table. By using _PARTITIONTIME in your query, you can restrict the number of partitions scanned. However, some uses of pseudo-columns do not limit the number of partitions scanned, you can find more details on that here.

Field-based partitioned table

These tables allow you to provide any TIMESTAMP or DATE columns of your choice. The data is automatically sent to the appropriate partition based on the date value(in UTC). We don’t need a _PARTITIONTIME pseudo column instead the partitioning column can be used to restrict the amount of data scanned.

When you create field-based partitioned tables, two special partitions are created:

  • The __NULL__ partition representing rows with NULL values in the partitioning column
  • The __UNPARTITIONED__ partition representing data that exists outside the allowed range of dates

For information on partitioned tables and its quotas/limit please go through their official documentation.

Using BigQuery Partitioning at Hevo

While creating tables in Schema Mapper or Models, you can select Partitioning Style for a table. By default, No Partition is selected. To partition the table, select the style and field accordingly. Do remember that you can only partition the table on a field which is of date or timestamp data type.

Last updated on 24 Aug 2020