Partitioning in BigQuery
ON THIS PAGE
Google BigQuery is truly a serverless way to handle massive datasets and a partitioned table is exactly the right way to handle that amount of data. It is a table that is divided into segments to make it easier to manage and query data. Partitions can improve query performance, and control costs by reducing the number of bytes read by a query.
There are two types of table partitioning in BigQuery:
Ingestion-Time partitioned tables: Tables partitioned based on the data’s ingestion (load) date or arrival date.
Field-based partitioned table: Tables that are partitioned based on the timestamp/date column.
Ingestion-time partitioned tables
When you create a table partitioned by ingestion time, it automatically loads data into date-based partitions that denotes the date when the data arrived.
The tables have a pseudo column named
_PARTITIONTIME that contains a date-based timestamp for data that is loaded into the table. By using
_PARTITIONTIME in your query, you can restrict the number of partitions scanned. However, some uses of pseudo-columns do not limit the number of partitions scanned, you can find more details on that here.
Field-based partitioned table
These tables allow you to provide any TIMESTAMP or DATE columns of your choice. The data is automatically sent to the appropriate partition based on the date value(in UTC). We don’t need a
_PARTITIONTIME pseudo column instead the partitioning column can be used to restrict the amount of data scanned.
When you create field-based partitioned tables, two special partitions are created:
__NULL__partition representing rows with NULL values in the partitioning column
__UNPARTITIONED__partition representing data that exists outside the allowed range of dates
For information on partitioned tables and its quotas/limit please go through their official documentation.
Using BigQuery Partitioning at Hevo
While creating tables in Schema Mapper or Models, you can select Partitioning Style for a table. By default, No Partition is selected. To partition the table, select the style and field accordingly. Do remember that you can only partition the table on a field which is of date or timestamp data type.