Loading Data to a Data Warehouse

Last updated on May 30, 2023

The data ingested from the Source is loaded to the Destination warehouse at each run of your Pipeline. By default, Hevo maintains any primary keys that are defined in the Source data, in the Destination tables.

You can load both types of data:

Data without Primary Keys

If primary keys are not present in the Destination tables, Hevo directly appends the data into the target tables. While this can result in duplicate Events occurring in the Destination, there is no resource overhead stemming from the data loading process.

Data with Primary Keys

If primary keys are present in the Source data but not enforceable on the Destination data warehouse as in the case of Amazon Redshift, Azure Synapse Analytics, Google BigQuery, and Snowflake, then, ensuring uniqueness of data is not possible by default. Hevo circumvents this lack of primary key enforcement and guarantees that no duplicate data is loaded to or exists in the Destination tables by:

  • Adding temporary Hevo-generated metadata columns to the tables to identify eligible Events.

  • Using specific queries to cleanse the data of any duplicate and stale Events.

  • Adding metadata information to each Event to uniquely identify its ingestion and loading time.

Note: These steps utilize your Destination system’s resources in terms of CPU usage for running the queries and additional storage utilization for the duration of processing of the data.

Atomicity of Data Loaded to the Destination

  • Amazon Redshift and Azure Synapse Analytics: Since the data loading steps of updating, inserting, and deleting Events use distinct queries, any system consuming data from the Destination table may see changes while the updates are being applied by Hevo. Further, updating is a two-step process that first deletes the stale Destination record, and then inserts the more current record. If data is queried from the Destination in between the two steps, results may momentarily be inconsistent.

  • Google BigQuery and Snowflake: The Merge SQL statement is used, which handles all three operations of deletions, insertions, and updates through a single query. This makes the operation faster and atomic, and you have a consistent view of the data.

Data Loading Process

The following diagram illustrates the typical process followed by Hevo to upload your data to a data warehouse.

Illustration of steps for loading data into a data warehouse

Refer to the data loading process of each data warehouse to understand the methodology followed to determine the eligible records to be loaded and view the queries performed for this at each step.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Mar-10-2023 NA Updated the sections, Data with Primary Keys and Atomicity of Data Loaded to the Destination to add information for Azure Synapse Analytics.
Apr-11-2022 NA Reorganized content.

Tell us what went wrong