Pipeline Modes

In case of RDBMS Sources, you can select from the following Pipeline Modes to define how Hevo must read your data from the database.

  • Table: In this mode, your tables are read individually at a fixed frequency. Use this mode to fetch data from multiple tables in your database, while maintaining control over the ingestion for every table individually. You can fetch data using different query modes. In the Table mode, Hevo does not fetch Views automatically from your database. As a workaround, you can create individual Pipelines in Custom SQL mode to fetch each View. However, some limitations may arise based on the type of data synchronisation, the query mode, or the number of Events. Contact Hevo Support for more detail.

  • Custom SQL: Use this mode if you are looking to fetch data in a different structure than how it is stored in your Source tables. Custom SQL mode allows you to fetch data using a custom query at a fixed frequency. Based on the query mode and the parameter/column name specified in the query mode configuration, Hevo fetches data from the Source tables/views.

    For example, if you want to fetch data from the view/table named some_table, then, you can simply write a query like the following:

    SELECT * FROM some_table

    Hevo runs the following query to fetch the data periodically:

    SELECT *
    FROM   some_table
    WHERE  updated_timestamp_column > last_polled_time
         AND updated_timestamp_column < Now() - delay
    ORDER  BY updated_timestamp_column ASC
    LIMIT  500000
    

    Note that aliased columns cannot directly be used in the job configuration fields. Your query must be written as a table expression before the aliased column can be used. moreover, if your query results in several columns with the same name, they should be aliased uniquely to disambiguate.

    Suppose you have two tables with these columns:

    user (id, name, updated_ts)

    employee (user_id, dept_name, updated_ts)

    And, you want to fetch data using the following query and query mode as Delta - Timestamp and timestamp column name as updated_ts (from the table employee):

    SELECT u.id,
       u.name,
       u.updated_ts,
       e.user_id,
       e.dept_name,
       e.updated_ts
    FROM   user u
       INNER JOIN employee e
               ON u.id = e.id
    

    Then, you must specify the query as:

    SELECT *
    FROM   (SELECT u.id,
                 u.name,
                 u.updated_ts AS user_updated_ts,
                 e.user_id,
                 e.dept_name,
                 e.updated_ts AS employee_updated_ts
           FROM   user u
                 INNER JOIN employee e
                         ON u.id = e.id)TABLE_ALIAS
    

    with timestamp column name being employee_updated_ts

    The corresponding Hevo query would be:

    SELECT *
    FROM   (SELECT u.id,
                 u.name,
                 u.updated_ts AS user_updated_ts,
                 e.user_id,
                 e.dept_name,
                 e.updated_ts AS employee_updated_ts
          FROM   user u
                 INNER JOIN employee e
                         ON u.id = e.id)TABLE_ALIAS
     WHERE  employee_updated_ts > last_polled_time
         AND employee_updated_ts < Now() - delay
     ORDER  BY employee_updated_ts ASC
     LIMIT  5000000
    

    Pipelines created in Custom SQL mode do not have any primary keys defined by default even though the selected Source columns have these. You need to manually define the primary keys to avoid duplicates, even if Auto Mapping is enabled.

    You can either do this by setting the primary keys as part of creating transformations or by creating them in the Destination table manually. Read Handling of Updates.

  • BinLog: This mode is applicable for MySQL Sources. In this mode, data is read using MySQL’s BinLog. This mode is useful when you are looking to replicate the complete database, as is, to the Destination. This mode is very efficient in replicating but leaves you with less control and manageability over data ingestion. Read BinLog Replication and BinLog Alerts.

  • Logical Replication: This mode is applicable for the Postgres Source. In this mode, data will be replicated using Postgres Write Ahead Log (WAL) set at a logical level (available on Postgres version 9.4 and above). This mode is useful when you are looking to replicate the complete database as it is. Please note that Hevo will create a new Replication Slot for the Pipeline which may lead to higher disk consumption in your Postgres Database. Read about the instructions to set up WAL for logical replication here.

  • Redo log: This mode is applicable for the Oracle Source. This involves using Oracle Logminer to incrementally ingest the data from oracle redo logs. This is the recommended way to do data replication in Oracle. You can read more about Logminer here. Refer to the Oracle Source variants documented in this section for instructions to set up the Redo Log for an Oracle database.

BinLog Replication

After a historical load of the initial state of the MySQL database, the BinLog is continuously streamed. The first time the data is ingested, the table definitions are directly read from the schema and not the BinLog. Post that all of the schema updates are read from the BinLog, for near real-time replication at scale. This approach supports both deletions and table alterations leading to exactly one-to-one replication. It doesn’t require locks or affect the performance of the database. It is also very consistent with the stream processing paradigm allowing near real-time performance.

BinLog Alerts

Where Pipeline mode is BinLog, if the BinLog retention period is set as less than 24 hours, (and Hevo has sufficient permissions to read the BinLog configurations), Hevo displays a one-time warning to increase the retention period. “BinLog retention is set to hours. We recommend having BinLog retention of at least 24 hours to avoid disruption in replication due to spikes.”

Refer to MySQL’s replication reference guide to know about the options available for replication and binary logging. The mysql_rds_set_configuration page provides information on BinLog retention in RDS instances.


See Also

Last updated on 18 Jan 2021