Query Modes for Relational Databases

This article talks about the Query Mode that can be used in relational databases such as Aurora, Redshift, SQL Server, MySQL, Postgres while loading data to Hevo.

Hevo performs SQL queries on the source database to get data. Querying options are particularly useful cases where you don’t want to or don’t have an option to set up DB’s replication logs. Query mode defines the strategy on how to query the source DB.

Below are the few Query Modes supported by Hevo:

Full Load

Full load lets you load all the data that is present in the table. The mode can be used only when there are less than 100,000 rows in your table. Even if the table is less than 100,000 rows, Hevo will poll the table only once in 6 hours.

Hevo runs the following query at the configured frequency.

SELECT * FROM table;

Delta - Append Only

The mode is useful when the rows in the table are immutable and the data is only being appended to the table.

Auto-Incrementing Column: Hevo will use this column to keep a track of the last polled id that it recorded.

Stop on Gap in Auto-Incrementing Column: If this option is enabled, Hevo will stop polling whenever there is a gap in the incrementing ids. The polling will start again after the configured frequency. This option should be used when the rows in the table are being written as a part of a long-running transaction. Using this option allows Hevo to wait for the transaction to commit.

For this query mode, Hevo runs the following query at the configured frequency:

SELECT * FROM table 
WHERE `id_column` > last_polled_id
ORDER BY `id_column` ASC 
LIMIT 500;

Delta - Timestamp

This mode is similar to Delta - Append Only. However, instead of an incrementing id, Hevo will ask you for a Timestamp Column and will query your source table with the entered timestamp column. The mode is useful when the data when:

a) The table is appended and a timestamp is maintained

b) Data in the table is updated and the updated timestamp is maintained

Update Timestamp Column: Hevo will use this column to query the source table and to keep track of last polled timestamp.

Timestamp Column Delay: This configuration can be used when you want to replicate data with a delay which is useful when there are long-running transactions or it is a design requirement. Enter the delay (in milliseconds) that you want to introduce to the query.

Hevo runs the following query at the configured frequency.

SELECT * FROM table 
WHERE `updated_timestamp_column` > last_polled_time 
  AND `updated_timestamp_column` < now() - delay
ORDER BY `time_column` ASC 
LIMIT 500;

Change Data Capture

This mode is a mixture of both Delta - Append Only and Delta - Timestamp. The mode is useful to capture both inserts and updates.

Auto-Incrementing Column: Hevo will use this column to keep a track of the last polled id it recorded.

Update Timestamp Column: Hevo will use this column to query the source table and to keep track of last polled timestamp.

Timestamp Column Delay: This configuration can be used when you want to replicate data with a delay which is useful when there are long-running transactions or it is a design requirement. Enter the delay (in milliseconds) you want to introduce to the query.

Hevo runs the following query at the configured frequency.

SELECT * FROM table
WHERE (`updated_timestamp_column` < now() - delay AND 
    ((`updated_timestamp_column` = last_polled_time 
        AND `id_column` > last_polled_id) 
      OR `updated_timestamp_column` > last_polled_time))
    OR (`updated_timestamp_column` is NULL 
      AND `id_column` > last_polled_null_id)
ORDER BY `updated_timestamp_column`, `id_column` ASC 
LIMIT 500;

Change Tracking

This mode makes use of the Change Tracking feature provided by the SQL Server. You may follow the instructions in this guide to enable change tracking for the tables. The change retention period should be more than the replication frequency of your Hevo data pipeline.