Ingestion Modes

Log-based

In log-based incremental replication, Hevo uses database log files to identify any record alterations such as inserts, updates, and deletes.

According to the Source type, Hevo allows you to select one of the following Log modes:

Log Mode Source Type
BinLog MySQL Source types
Change Tracking MS SQL Source types
Logical Replication Postgre SQL Source types
OpLog MongoDB Source types
Redo Log Oracle Source Types

BinLog (Binary Logging)

The BinLog mode is applicable for MySQL Source types. In this mode, data is read using MySQL’s BinLog. This mode is useful when you are looking to replicate the complete database, as is, to the Destination. This mode is very efficient in replicating but leaves you with less control and manageability over data ingestion.

After a historical load of the initial state of the MySQL database, the BinLog is continuously streamed. The first time the data is ingested, the table definitions are directly read from the schema and not the BinLog. Post that, all of the schema updates are read from the BinLog for near real-time replication at scale. This approach supports both deletions and table alterations leading to exactly one-to-one replication. It does not require locks or affect the performance of the database. It is also consistent with the stream processing paradigm allowing near real-time performance.


Change Tracking

The Change Tracking mode is applicable for MS SQL Source types. It makes use of the Change Tracking feature provided by SQL Server to track the changes made to the Source objects. It captures the records (rows) that were inserted, updated, or deleted in these objects.

Before selecting this mode, you must use an SQL client tool to enable Change Tracking at the database or object level. Once this is done, Hevo automatically sets Change Tracking as the query mode for the objects that you want to track. The Change Tracking period should be more than the replication frequency of your Hevo Pipeline to avoid any data loss.

Note: Automatic selection of Change Tracking as the query mode occurs only for new Pipelines.


Logical Replication

Logical Replication is applicable for the Postgres Source types. In this mode, data is replicated using Postgres Write Ahead Log (WAL) set at a logical level (available on Postgres version 9.4 and above). This mode is useful when you are looking to replicate the complete database as it is. Please note that Hevo will create a new Replication Slot for the Pipeline which may lead to higher disk consumption in your Postgres Database.

Note: Hevo creates a new Replication Slot for the Pipeline, which may lead to higher disk consumption in your Postgres Database.

Read Set up Log-based Incremental Replication for steps to set up WAL for logical replication.


OpLog

OpLog is applicable to the MongoDB Source. In this mode, data is polled using MongoDB’s OpLog. The OpLog is a collection of individual, transaction-level details which help replicas sync data from the primary instance. Read more about MongoDB’s OpLog and OpLog Alerts in Hevo.


Redo Log

This mode is applicable for the Oracle Source Types. It uses Oracle Logminer to incrementally ingest the data from Oracle Redo Logs. This is the recommended mode for replicating data from an Oracle database Source. Read Using Logminer. For Pipelines created after Release 1.96, Hevo supports the RedoLog ingestion mode for Oracle Database 19c and higher. Refer to the documentation for the Oracle Source variants for instructions to set up Redo Log for an Oracle database. Also read Pipeline failure due to Redo Log expiry to follow the troubleshooting steps in case the Redo Log expires.

Note: This mode does not support user-defined data types for fields. Read Generic Oracle.


Table

In the Table mode, your tables are read individually at a fixed frequency. Use this mode to fetch data from multiple tables in your database, while maintaining control over the ingestion for every table individually. You can fetch data using different query modes.

In Table mode:

  • Hevo does not fetch Views automatically from your database. As a workaround, you can create individual Pipelines in the Custom SQL mode to fetch each View. However, some limitations may arise based on the type of data synchronization, the query mode, or the number of Events. Contact Hevo Support for more detail.

  • Hevo does not update the Destination tables for any deletes that may have occurred in the Source data. In log-based replication, deletes can be identified by the field hevo_marked_deleted being _True for an Event. However, in the Table mode, data is fetched using SQL queries, and these do not offer any mechanism to determine the deletes.

  • For MS SQL Source types, Hevo automatically sets Change Tracking as the query mode if you have enabled Change Tracking on the Source objects.


Custom SQL

The Custom SQL mode allows you to fetch data in a different structure than how it is stored in your Source tables. Custom SQL mode allows you to fetch data using a custom query at a fixed frequency. Based on the query mode and the parameter/column name specified in the query mode configuration, Hevo fetches data from the Source tables/views.

For example, if you want to fetch data from the view or table named some_table, then, you can simply write a query like the following:

SELECT * FROM some_table

Hevo runs the following query to fetch the data periodically:

SELECT *
FROM   some_table
WHERE  updated_timestamp_column > last_polled_time
     AND updated_timestamp_column < Now() - delay
ORDER  BY updated_timestamp_column ASC
LIMIT  500000

Note: Aliased columns cannot directly be used in the job configuration fields. Your query must be written as a table expression before the aliased column can be used. Moreover, if your query results in several columns with the same name, they must be aliased uniquely to disambiguate.

Suppose you have two tables with these columns:

user (id, name, updated_ts)

employee (user_id, dept_name, updated_ts)

And, you want to fetch data using the following query and query mode as Delta - Timestamp and timestamp column name as updated_ts (from the table employee):

SELECT u.id,
   u.name,
   u.updated_ts,
   e.user_id,
   e.dept_name,
   e.updated_ts
FROM   user u
   INNER JOIN employee e
           ON u.id = e.id

Then, you must specify the query as:

SELECT *
FROM   (SELECT u.id,
             u.name,
             u.updated_ts AS user_updated_ts,
             e.user_id,
             e.dept_name,
             e.updated_ts AS employee_updated_ts
       FROM   user u
             INNER JOIN employee e
                     ON u.id = e.id)TABLE_ALIAS

with timestamp column name being employee_updated_ts.

The corresponding Hevo query would be:

SELECT *
FROM   (SELECT u.id,
             u.name,
             u.updated_ts AS user_updated_ts,
             e.user_id,
             e.dept_name,
             e.updated_ts AS employee_updated_ts
      FROM   user u
             INNER JOIN employee e
                     ON u.id = e.id)TABLE_ALIAS
WHERE  employee_updated_ts > last_polled_time
     AND employee_updated_ts < Now() - delay
ORDER  BY employee_updated_ts ASC
LIMIT  5000000

Pipelines created in the Custom SQL mode do not have any primary keys defined by default even though the selected Source columns have these. You need to manually define the primary keys to avoid duplicates, even if Auto Mapping is enabled.

You can either do this by setting the primary keys as part of creating transformations or by creating them in the Destination table manually. Read Handling of Updates.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-08-2022 NA Moved under Ingestion Modes and Query Modes.
Sep-16-2022 NA Categorised the various log-based ingestion modes under the section, Log-based.
Dec-06-2021 1.77 Updated sections, Redo Log and See Also with a link to the Pipeline failure due to Redo Log expiry page.
Nov-26-2021 NA Updated section, Redo Log to add that Redo Logs are not supported for Oracle Database 19c.
Nov-22-2021 NA Added a note in the section, Redo Log.
Sep-09-2021 1.71 - Updated the section, BinLog Alerts to include the error message shown on BinLog expiry.
- Provided a See Also link to the troubleshooting page.
- Added a list item in the section, Table.
- Updated the section, Change Tracking.
May-19-2021 NA Added section, OpLog
Mar-30-2021 NA Added a bullet point in the section, Table to explain that Hevo does not handle Source record deletions in the Destination table.
Mar-09-2021 1.58 Added Change Tracking as a distinct Pipeline mode for MS SQL Source Types in Hevo.
Last updated on 08 Nov 2022

Tell us what went wrong