Partitioning in BigQuery
On This Page
Google BigQuery offers a serverless way to handle massive datasets through the use of partitioned tables. These are tables that are divided into segments to make it easier to manage and query data. Partitions can improve query performance, and control costs by reducing the number of bytes read by a query.
There are two types of table partitioning in BigQuery:
Ingestion time based: Tables are partitioned based on the data’s ingestion (load) date or arrival date.
Field based: Tables that are partitioned based on the timestamp/date column.
Ingestion-time Partitioned Tables
When you create a table partitioned by ingestion time, it automatically loads data into date-based partitions that denotes the date when the data arrived.
The tables have a pseudo column named
_PARTITIONTIME that contains a date-based timestamp for data that is loaded into the table. By using
_PARTITIONTIME in your query, you can restrict the number of partitions scanned. However, some uses of pseudo-columns do not limit the number of partitions scanned, you can find more details on that here.
Field-based Partitioned Tables
These tables allow you to provide any TIMESTAMP or DATE columns of your choice. The data is automatically sent to the appropriate partition based on the date value(in UTC). We don’t need a
_PARTITIONTIME pseudo column instead the partitioning column can be used to restrict the amount of data scanned.
When you create field-based partitioned tables, two special partitions are created:
__NULL__partition representing rows with NULL values in the partitioning column
__UNPARTITIONED__partition representing data that exists outside the allowed range of dates
For information on partitioned tables and its quotas/limit please go through their official documentation.
- BigQuery allows you to create a maximum of 4000 partitions per partitioned table. Read Quotas and limits: Partitioned Tables.
Using BigQuery Partitioning in Hevo
You can enable partitioning while creating the tables in Schema Mapper or Models. By default, No Partition is selected. To partition a table:
Select the Partition Style.
Select the Partition Type. Default value: Day. From Release 1.67 onwards, you can specify a different partitioning type, such as Hour, Month, and Year. Do remember that you can only partition the table on a field that which is of date or timestamp data type.
Refer to the following table for the list of key updates made to this page:
|Date||Release||Description of Change|
|Aug-09-2021||1.67||Updated the section, Using BigQuery Partitioning in Hevo.|
|May-19-2021||NA||Added section, Destination Considerations.|