Loading Data to a Data Warehouse
On This Page
The data ingested from the Source is loaded to the Destination warehouse at each run of your Pipeline. By default, Hevo maintains any primary keys that are defined in the Source data, in the Destination tables.
You can load both types of data:
Data without Primary Keys
If primary keys are not present in the Destination tables, Hevo directly appends the data into the target tables. While this can result in duplicate Events occurring in the Destination, there is no resource overhead stemming from the data loading process.
Data with Primary Keys
If primary keys are present in the Source data but not enforceable on the Destination warehouse, as in the case of Google BigQuery, Amazon Redshift, and Snowflake, then, ensuring uniqueness of data is not possible by default. Hevo circumvents this lack of primary key enforcement and guarantees that no duplicate data is loaded to or exists in the Destination tables by:
Adding temporary Hevo-internal meta columns to the tables to identify eligible Events,
Using specific queries to cleanse the data of any duplicate and stale Events,
Adding metadata information to each Event to uniquely identify its ingestion and loading time
Note: These steps utilize your Destination system’s resources in terms of CPU usage for running the queries and additional storage utilization for the duration of processing of the data.
Atomicity of Data Loaded to the Destination
Amazon Redshift: Since the data loading steps of updating, inserting, and deleting Events use distinct queries, any system consuming data from the Destination table may see changes while the updates are being applied by Hevo. Further, update is a two step process of first deleting the stale Destination record and then inserting the more current record. If data is queried from the Destination in between the two steps, results may momentarily be inconsistent.
Google BigQuery and Snowflake: The Merge SQL statement is used, which handles all three operations of deletions, insertions, and updates through a single query. This makes the operation faster and atomic, and you have a consistent view of the data.
Data Loading Process
The following diagram illustrates the typical process followed by Hevo to upload your data to a data warehouse.
Refer to the data loading process of each data warehouse to understand the methodology followed to determine the eligible records to be loaded and view the queries performed for this at each step.
Refer to the following table for the list of key updates made to this page:
|Date||Release||Description of Change|