Query Modes for Ingesting Data from Relational Databases

Hevo performs SQL queries on the Source database to get data. You can select different modes to query for the data to be ingested from relational databases such as Amazon Aurora MySQL, Amazon Redshift, SQL Server, MySQL, PostgreSQL.

Querying options are particularly useful where you do not want to or do not have the option to set up the database replication logs.

If you change the query mode:

  • The ingestion job restarts from the beginning. For example, if you change the query mode from Unique Incrementing Append Only to Change Data Capture, the entire data is ingested again from the beginning.

  • The re-ingested Events are considered as historical data and are not billed.

Query Modes

Hevo supports the following query modes:

Full Load

Full load query lets you ingest all the data that is present in the table. Once the entire table is loaded, Hevo waits for six hours to polls the table again.

Prior to Release 1.61, this mode could be used when there were less than 5,000,000 (5 Million) rows in the table. The following query was used to fetch the full load:

SELECT * FROM table;

As of Release 1.61, you can fetch any number of rows from the Source table. To do this, Hevo uses indexed columns, if any are defined in the table, to query data in an incremental manner with a default limit of 1 Million rows till the entire table is ingested, as follows:

SELECT * FROM table
  WHERE `indexed_column` > last_polled_id
  ORDER BY `indexed_column` ASC
  LIMIT 1000000;

Hevo looks for the indexed columns in the following order of priority:

  • Primary Key

  • The first available unique key

  • From the indices defined on the table, the first available index that does not have any value whose frequency is more than 1 million

If no indexed column is available, Hevo reverts to using the original query with an ingestion limit of 5 Million rows.

Delta - Append Only

The mode is useful when the rows in the table are immutable and the data is only being appended to the table.

Auto-Incrementing Column: Hevo will use this column to keep a track of the last polled ID that it recorded.

Stop on Gap in Auto-Incrementing Column: If this option is enabled, Hevo will stop polling whenever there is a gap in the incrementing IDs. The polling will start again after the configured frequency. This option should be used when the rows in the table are being written as a part of a long-running transaction. Using this option allows Hevo to wait for the transaction to commit.

For this query mode, Hevo runs the following query at the configured frequency:

SELECT * FROM table
WHERE `id_column` > last_polled_id
ORDER BY `id_column` ASC
LIMIT 500;

Delta - Timestamp

This mode is similar to Delta - Append Only. However, instead of an incrementing ID, Hevo asks you for a Timestamp Column and queries your Source table with the entered timestamp column. The mode is useful when:

a) The table is appended and a timestamp is maintained

b) Data in the table is updated and the updated timestamp is maintained

Update Timestamp Column: Hevo will use this column to query the Source table and to keep track of last polled timestamp.

Timestamp Column Delay: This configuration can be used when you want to replicate data with a delay which is useful when there are long-running transactions or it is a design requirement. Enter the delay (in milliseconds) that you want to introduce to the query.

Hevo runs the following query at the configured frequency.

SELECT * FROM table
WHERE `updated_timestamp_column` > last_polled_time
  AND `updated_timestamp_column` < now() - delay
ORDER BY `time_column` ASC
LIMIT 500;

Change Data Capture

This mode is a mixture of both Delta - Append Only and Delta - Timestamp. The mode is useful to capture both inserts and updates.

Auto-Incrementing Column: Hevo will use this column to keep a track of the last polled ID it recorded.

Update Timestamp Column: Hevo will use this column to query the Source table and to keep track of last polled timestamp.

Timestamp Column Delay: This configuration can be used when you want to replicate data with a delay which is useful when there are long-running transactions or it is a design requirement. Enter the delay (in milliseconds) you want to introduce to the query.

Hevo runs the following query at the configured frequency.

SELECT * FROM table
WHERE (`updated_timestamp_column` < now() - delay AND
    ((`updated_timestamp_column` = last_polled_time
        AND `id_column` > last_polled_id)
      OR `updated_timestamp_column` > last_polled_time))
    OR (`updated_timestamp_column` is NULL
      AND `id_column` > last_polled_null_id)
ORDER BY `updated_timestamp_column`, `id_column` ASC
LIMIT 500;

Change Tracking

This mode makes use of the Change Tracking feature provided by SQL Server. You may follow the instructions in this guide to enable change tracking for the Source objects. The change retention period should be more than the replication frequency of your Hevo Pipeline to avoid any data loss.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jun-28-2021 1.61 Updated the Full Load section with information about ingesting more than 5 Million rows.
Jun-01-2021 NA Updated the page overview to explain the Hevo app behavior in case changes are made to the query mode.
Last updated on 24 Jun 2021