Data Replication

Last updated on Sep 25, 2024

Edge Pipeline is currently available under Early Access. You can request access to evaluate and test its features.

Data replication combines the tasks of ingesting data from your Source and loading it to your Destination. The entire process of fetching the Source data, collating or mapping it to Destination table fields, and then loading it to the Destination is performed through a Pipeline job.

The process of fetching existing and new data from a Source database is called Data Ingestion. Hevo fetches the data based on the sync frequency of the Pipeline.

Once the data has been ingested, it can be loaded to the Destination. Data loading involves the following considerations:

  • Identifying where to load: You can configure the Destination while creating your Pipeline or directly create one.

  • Mapping the data objects and fields: Once you configure your Source settings, Hevo fetches all the objects available in the Source. You can select the objects that you want to load to the Destination. Objects are available for selection based on the Source configuration. For example, in PostgreSQL Sources, only the objects included under the publication key you defined during Source configuration are available for ingestion and loading. All the fields of the selected objects are mapped to the respective Destination table columns by default. However, you can change the field selection during or post-Pipeline creation.

  • Deduplicating the data: If you select the Merge load mode during Pipeline creation, Hevo loads your data using the primary keys defined in the Destination table to prevent duplication of records. You can define the primary key from the Object Configuration tab of your Pipeline. If it is already defined, you cannot change it. If you select the Append load mode, Hevo loads the data to the subsequent rows of the Destination table without any deduplication. In Append mode, primary keys play no role. We recommend using the Append load mode only when you want to track every update on a granular level.

  • Optimizing the cost, time, and Events quota consumption: Many factors along the data replication process determine the overall cost of loading the data to the Destination. For example, how frequently the data is ingested can impact the cost due to the needless re-ingestion of some Events. You are also charged by your warehouse providers based on the computing and storage you utilize.

Tell us what went wrong