Loading Data to a Data Warehouse

Last updated on Nov 15, 2023

The data ingested from the Source is loaded to the Destination warehouse at each run of your Pipeline. Hevo ensures that any primary keys that are defined in the Source data are retained in the Destination tables.

You can load both types of data:

Data without Primary Keys

If primary keys are not present in the Destination tables, Hevo directly appends the data into the target tables. While this can result in duplicate Events, there is no resource overhead from the data loading process.

Data with Primary Keys

If primary keys are present in the Source data but not enforceable on the Destination data warehouse, the uniquenes of data cannot be ensured by default. This can happen in the case of Destinations like Amazon Redshift, Azure Synapse Analytics, Google BigQuery, and Snowflake. Hevo circumvents this lack of primary key enforcement and guarantees that no duplicate data is loaded to or exists in the Destination tables by:

  • Adding temporary Hevo-generated metadata columns to the tables to identify eligible Events.

  • Using specific queries to remove any duplicate and stale Events.

  • Adding metadata information to each Event to uniquely identify its ingestion and loading time.

Note: These steps utilize your Destination system’s resources. For example, CPU usage for running the queries and additional storage utilization for processing the data.

Atomicity of Data Loaded to the Destination

  • Amazon Redshift and Azure Synapse Analytics: The steps of updating, inserting, and deleting Events use distinct queries. Therefore, any system consuming data from the Destination table may see changes while Hevo applies the updates. Further, updating is a two-step process that first deletes the stale Destination record, and then inserts the more current record. If data is queried from the Destination in between the two steps, results may momentarily be inconsistent.

  • Google BigQuery and Snowflake: For these Destinations, the Merge SQL statement is used to handle all three operations of deletion, insertion, and update through a single query. This makes the operation faster and atomic, giving you a consistent view of the data.

Data Loading Process

The following diagram illustrates the typical process followed by Hevo to upload your data to a data warehouse.

Illustration of steps for loading data into a data warehouse

Refer to the data loading process of each data warehouse to know how Hevo identifies the records eligible to be loaded and view the queries performed at each step of this process.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Mar-10-2023 NA Updated the sections, Data with Primary Keys and Atomicity of Data Loaded to the Destination to add information for Azure Synapse Analytics.
Apr-11-2022 NA Reorganized content.

Tell us what went wrong