Pausing a Log based pipeline

What is a log based pipeline?

A log based pipeline moves data from a source to a destination by trimming through logs which consist of events describing changes made to a database, wherein data is ingested at a fixed interval.

These logs are generally maintained for replication or recovery of data.

Hevo supports a variety of log based pipelines which are listed below:

  1. A MySQL source configured using BinLog mode.
  2. A PostgreSQL source configured using Logical Replication Mode.
  3. A DynamoDB source.
  4. A MongoDB source configured using either Change Stream or OpLog Mode.
  5. An Aurora source configured using BinLog mode.
  6. A SQL Server with individual jobs configured with `Change Tracking` as the query mode.

Consequences of pausing a log based pipeline

Each log has a retention period associated with it which specifies the duration a log is maintained before it is automatically deleted. Logs are purged periodically to prevent them from taking up too much disk space. Pausing a pipeline pauses the process of ingestion. Pausing a log based pipeline for long durations may lead to data loss as a result of the log being deleted due to expiry of its retention period. Therefore, such an action can potentially lead to loss of data as a result of ingestion being halted.

Note: The same applies to individual jobs configured with change tracking in the case of a SQL server source.