Types of Data Synchronization (Edge)
On This Page
Edge Pipeline is currently available under Early Access. You can request access to evaluate and test its features.
Data synchronization is categorized based on the timeline of the data ingested from the Source. Hevo creates different types of jobs to replicate this data. For example, historical jobs replicate existing data available at the time of Pipeline creation, while incremental jobs replicate new or updated data generated after that point.
Historical Data
Historical data is the data existing in the Source before the Pipeline creation. To ingest this data, Hevo runs a historical job that retrieves and loads the pre-existing data into the Destination before starting incremental ingestion.
All Sources in Hevo support historical data ingestion. Events ingested as part of this load are not billed.
After the historical data is loaded into your Destination table, the job is marked as Completed. It will not run again unless you resync the object or the Pipeline. If you resync either one, all existing data is re-ingested and not billed. For a Pipeline, this means re-ingesting data for all active objects in it.
If primary keys are defined in the Source, Hevo uses them to deduplicate data during replication in the Merge load mode. Primary keys must uniquely identify each data record and be defined on non-nullable columns. If primary keys are not defined or not auto-selected, Hevo prompts you to set one for each table during Pipeline creation. If no primary key is defined, the object is marked as Disabled. Hevo does not replicate data from such objects. In the Append load mode, Hevo inserts all incoming Events into the Destination as new rows without deleting or updating existing ones. In this mode, primary keys play no role, and defining a primary key is not required.
Regardless of the load mode, existing primary keys cannot be altered for an object.
Enabling historical load for a Pipeline
By default, historical data is ingested the first time you run the Pipeline. This happens when you select the Replicate existing data and ongoing changes (Historical and Incremental) replication mode. In this mode, Hevo first loads all existing data from the selected Source objects and then begins replicating new and updated Events. Hevo recommends this replication mode because it ensures that both historical data and ongoing changes are available in the Destination.
If you do not want Hevo to ingest historical data, select Replicate data changes only (Incremental only) while configuring the Pipeline. It replicates Events only from the time the Pipeline is created. However, if you want to ingest historical data after creation, you can resync the Pipeline. Additionally, if an object is in an inactive state, such as Disabled, Skipped, or Inconsistent, and you move it to an Active state, Hevo triggers a re-sync of the complete historical data, regardless of the Pipeline’s replication mode.
Note: You cannot change the replication mode after the Pipeline is created. The selected replication mode applies uniformly to all objects in the Pipeline. Additionally, you cannot configure individual objects to replicate only historical or only incremental data.
Parallel Historical and Incremental Ingestion
Hevo runs historical and incremental ingestion jobs in parallel. This avoids delays and prevents the accumulation of unprocessed data in the Source.
When a Pipeline is created, Hevo captures the starting offset for incremental ingestion and begins reading new data immediately. Simultaneously, it initiates the historical job to ingest existing data. Incremental jobs are staged until the historical job is complete. Once completed, Hevo loads the data ingested by the staged incremental jobs to the Destination in the order in which it was ingested.
A parallel incremental job is created only when the historical job is in the In Progress state. This ensures that historical and incremental jobs do not interfere with each other’s processing and loading order.
Note:
-
When a historical job and a parallel incremental job are running for a Pipeline, canceling the historical job automatically cancels all associated incremental jobs. However, a parallel incremental job cannot be cancelled manually using the CANCEL JOB action.
-
If the historical job fails, all subsequent incremental jobs also fail.
-
During parallel ingestion, the Sync Now action and Pipeline edits are not available.
Incremental Data
Incremental data is the changed data that is fetched continuously. For example, entries from database logs.
After the historical job is complete, Hevo runs incremental jobs for each object at the defined sync frequency. During incremental ingestion, Hevo maintains an internal offset to track the exact position of the last successful sync. This ensures that only new and updated Events are ingested.
Incremental load offers efficiency by updating only the changed data instead of re-ingesting the entire data for the objects. All Events ingested through incremental jobs are billed.