Loading Data in a Database Destination

Hevo replicates the data from your Source system to the database Destination based on the primary keys defined in the Destination table. As part of loading the data, deduplication is done to ensure that only unique records are replicated and duplicates are dropped. This deduplication is done using the value of the primary key.

Deduplicating Data Loaded to the Database

The key steps involved in the deduplication process are:

  1. Load data into a temporary table.

  2. Adding metadata fields to the Destination schema.

  3. Apply the data from the temporary table to the Destination table in one of the following ways:

    • Update existing rows with ingested Events if the primary key already exists.

    • Append the ingested Events as new rows if primary key does not exist.

      Note: Deletion of an Events is handled as an update, by setting the value of __hevo_marked_delete field to True.

  4. Delete the temporary table.

The Data Loading Process is illustrated below.

Note: The CPU time and storage space of the Destination is consumed for the duration of the data loading process.

Additions to the Destination Schema

Hevo adds the following columns to the database tables as part of the data loading process:

Metadata Column Description
__hevo_ingested_at A timestamp that is applied to each Event during ingestion. This timestamp helps to verify that the ingested Event is more current than what already exists in the Destination. For example, by the time a failed Event is resolved and replayed, a more recent Event may already have been loaded to the Destination. Comparing the ingestion timestamp, the stale record can be discarded from the ingested data. The timestamp is retained in the Destination table also.
__hevo_loaded_at A timestamp to indicate when data was inserted/updated/deleted (delete flag updated) in the Destination table. The difference between __hevo_ingested_at and __hevo_loaded_at measures the total time taken by Hevo to process the respective Event, and can be used to identify latency.
__hevo__marked_deleted A column to logically represent a deleted Event. When an Event is deleted in the Source, it is not physically deleted from the Destination table during the data loading process. Instead, a logical column, __hevo__marked_deleted, is set to True for it.

Data Loading Process

The following diagram illustrates the process of loading data to a Database Destination:

Loading Data to a database

Last updated on 09 Nov 2020