Query Modes

Last updated on Jul 14, 2023

Hevo queries the Source to ingest data. Querying can differ depending on the type of Source. Read the following sections to know how Hevo performs queries in case of SaaS, and Relational database Sources.


Relational Database Sources

Hevo performs SQL queries on the Source database to get data. You can select different modes to query for the data to be ingested from relational databases such as Amazon Aurora MySQL, Amazon Redshift, SQL Server, MySQL, and PostgreSQL. Hevo allows you to choose the query mode for every object as shown in the image below:

Query Modes

Note: From Release 2.14 onwards, for Table mode-based Pipelines created with a PostgreSQL Source, the query mode defaults to XMIN.

Querying options are particularly useful where you do not want to or do not have the option to set up the database replication logs.

If you change the query mode:

  • The ingestion job restarts from the beginning. For example, if you change the query mode from Unique Incrementing Append Only to Change Data Capture, the entire data is ingested again from the beginning.

  • The re-ingested Events are considered as historical data and are not billed.

Hevo supports the following query modes:

Full Load

In Full Load, Hevo ingests all the data from the Source table on each ingestion run and loads it to the Destination. Once the entire table is loaded, Hevo waits for six hours to ingest the table again.

Prior to Release 1.61, this mode could be used when there were less than 5 Million rows in the table. The following query was used to fetch the Full Load data:

SELECT * FROM table;

As of Release 1.61, you can fetch any number of rows from the Source table. To do this, Hevo uses indexed columns, if any are defined in the table, to query data in an incremental manner with a default limit of 1 Million rows till the entire table is ingested, as follows:

SELECT * FROM table
  WHERE `indexed_column` > last_polled_id
  ORDER BY `indexed_column` ASC
  LIMIT 1000000;

Hevo looks for the indexed columns in the following order of priority:

  • Primary Key

  • The first available unique key

  • From the indices defined on the table, the first available index that does not have any value whose frequency is more than 1 million

If no indexed column is available, Hevo reverts to using the original query with an ingestion limit of 5 Million rows.

In log mode-based Pipelines, the query mode is used only to ingest the historical data. If any index or timestamp column is available for a table, that is used to fetch the data. Hence, the Full Load mode is not made available for selection even if you try to edit a table’s query mode post-Pipeline creation.

For the PostgreSQL Source, as of Release 2.07, Hevo provides XMIN as an alternative to the Full Load query mode. In the case of the SQL Server Source, if change tracking is enabled for a table, Full Load mode is not shown.


Unique Incrementing Append Only

This mode is useful when the rows in the table are immutable and the data is only being appended to the table.

Auto-Incrementing Column: Hevo uses this column to keep a track of the last ingested ID that it recorded.

Note: This column should not contain any null values.

For this query mode, Hevo runs the following query at the configured frequency:

SELECT * FROM table
WHERE `id_column` > last_polled_id
ORDER BY `id_column` ASC
LIMIT 500;

Example:

Consider the following snapshot of a Source object with two records:

Existing Source Object UIAO

Here, User Id is the unique incrementing column. Once these records are ingested, Hevo stores the highest value of User Id as the last_polled_id, which in this case is 2.

Updated Source Object UIAO

Now, suppose record 1 is changed, and record 3 is added in the Source. When Hevo performs the SQL query, record 1 is ignored, record 3 is picked up and the value of last_polled_id is modified to 3.

Updated Destination Object UIAO

Note: If no new records are added to the Source object since the last fetch, then the UIAO query mode does not return a result.


Delta - Timestamp

This mode is similar to Unique Incrementing Append Only. However, instead of an incrementing ID, in this case, Hevo asks you for a Timestamp Column and queries your Source table based on it. This mode is useful when:

  • The table is appended and a timestamp is maintained.

  • Data in the table is updated and the updated timestamp is maintained.

Note: If your tables do not have a timestamp column, you can use Log-based ingestion mode for tracking updates to existing records in the table.

Update Timestamp Column: Hevo uses this column to query the Source table and to keep track of the updated_timestamp_column value.

For this query mode, Hevo runs the following query to ingest the data from your Source at the configured frequency:

SELECT * FROM table
WHERE `updated_timestamp_column` > last_polled_time
  AND `updated_timestamp_column` < now()
ORDER BY `time_column` ASC
LIMIT 500;

Example:

Consider the following snapshot of a Source object with two records:

Existing Source Object DT

Here, Update TS is the updated_timestamp_column. Once these records are ingested, Hevo stores the latest timestamp of Update TS as the updated_timestamp_column value, which in this case is Fri, 16 Apr 2021 06:10:51 GMT.

Updated Source Object DT

Now, suppose record 1 is changed, and records 3 is added. When Hevo performs the SQL query, record 1 is picked up, record 3 is ignored and the value of updated_timestamp_column is modified to Fri, 16 Apr 2021 07:10:51 GMT.

Updated Destination Object DT


Change Data Capture

This mode is a mixture of both Unique Incrementing Append Only and Delta - Timestamp modes. The mode is useful for capturing both inserts and updates.

Auto-Incrementing Column: Hevo uses this column to keep a track of the last ingested ID it recorded.

Update Timestamp Column: Hevo uses this column to query the Source table and to keep track of updated_timestamp_column value.

Timestamp Column Delay: This configuration can be used when you want to replicate data with a delay which is useful when there are long-running transactions or it is a design requirement. Enter the delay (in milliseconds) you want to introduce to the query.

Hevo runs the following query at the configured frequency.

SELECT * FROM table
WHERE (`updated_timestamp_column` < now() - delay AND
    ((`updated_timestamp_column` = last_polled_time
        AND `id_column` > last_polled_id)
      OR `updated_timestamp_column` > last_polled_time))
    OR (`updated_timestamp_column` is NULL
      AND `id_column` > last_polled_null_id)
ORDER BY `updated_timestamp_column`, `id_column` ASC
LIMIT 500;

Example:

Consider the following snapshot of a Source object with two records:

Existing Source Object CDC

Here, User Id and Update TS are the last_polled_id and updated_timestamp_column respectively. Once these records are ingested, Hevo stores the latest values of User Id and Update TS as the last_polled_id and updated_timestamp_column value, which in this case are 2 and Fri, 16 Apr 2021 06:10:51 GMT respectively.

Updated Source object CDC

Now, suppose record 1 is changed, and record 3 is added in the Source. When Hevo performs the SQL query, records 1 and 3 are picked up and the values of last_polled_id and updated_timestamp_column are modified to 3 and Fri, 16 Apr 2021 07:10:51 GMT respectively.

Updated Destination Object CDC


Change Tracking

This mode makes use of the Change Tracking feature provided by SQL Server. Change Tracking should be enabled for the database objects in order to use it. You may follow this guide to enable change tracking for the Source objects. The change retention period should be more than the replication frequency of your Hevo Pipeline to avoid any data loss.

Unlike other query modes, Change Tracking can also be enabled/disabled at the Table level.

Note: Use an SQL client tool to enable change tracking for the objects you want to track, and subsequently select Change Tracking in the Query Mode drop-down.

Hevo uses Change Tracking (CT) instead of Change Data Capture (CDC) for some Sources as CT requires lesser data to be managed and stored at the Source. Change Tracking replicates the historical data changes and Hevo pushes these changes to the Destination in near real-time, compared to CDC which maintains all the historical changes to the data at the Source.


XMIN

The XMIN mode is available only for PostgreSQL Sources, as it uses PostgreSQL’s system-generated incremental column XMIN to query data. From Release 2.14 onwards, Hevo selects XMIN as the default query mode for all Table mode-based Pipelines. For existing Table-mode Pipelines, you can change the query mode to XMIN for any object included in the Pipeline. In such a case, the historical load is re-ingested for that object.

The XMIN column stores the transaction ID of the INSERT transactions for each row in a table. This value gets incremented if any change is made in the row. The XMIN query mode reads this value to identify the inserts and updates made to the table and syncs the data with the Destination.

To do this, Hevo runs the following query at the configured frequency:

SELECT * from table
WHERE xmin::text::bigint >= last_polled_id
ORDER BY xmin::text::bigint ASC
LIMIT 500000

This query fetches the starting 500,000 records having an XMIN value greater than the last_polled_id, and similarly, the remaining records in lots of 500,000 in each subsequent run of the Pipeline. Here, the last_polled_id is equal to the maximum XMIN value available in the past run of the Pipeline.

Example:

Consider the following snapshot of a Source object with three records. Here, XMIN is the hidden, system-generated column. Once these records are ingested, Hevo stores the latest value of XMIN as the last_polled_id, which in this case is 3.

Sample Source Object

Now, suppose records 1 and 3 are updated in one transaction, and record 4 is added in the next transaction in the Source. With XMIN, all the updates or inserts that happen in a single transaction get the same XMIN value. Hence, XMIN values of records 1 and 3 get incremented to 4, which is the next available value. Since the record 4 is added in the next transaction, it gets an XMIN value of 5.

When Hevo runs the SQL query to fetch the Source data, records 1, 3, and 5 are picked up as XMIN values for these records are more than the last_polled_id, 3. Also, the last_polled_id is updated to 5 for the next run of the Pipeline.

Sample Updated Source Object

After the successful run of the Pipeline, you get the data with all new updates and inserts in the Destination table as shown in the following snapshot.

Sample Updated Destination Object

When to use XMIN?

XMIN can be used as an alternative to the Full Load query mode if the Pipeline mode is Table but the Source tables do not have any timestamp or indexed column.

Due to the absence of any timestamp or indexed column in the Source table, you cannot use the UIAO, Delta- Timestamp, or CDC query modes. This makes Full Load the only query mode available for fetching data. However, Full Load mode may not be optimal in this scenario, as the whole table would be ingested again in each run of your Pipeline. Moreover, in Full Load, if the table has more than 5 Million rows, the query starts timing out. Thus, Hevo recommends the XMIN query mode in this scenario.

Read Optimizing Query Modes for Objects to modify the query mode of an existing Pipeline from Full Load to XMIN.

Limitations of the XMIN query mode

  • The XMIN query mode cannot track row deletions because the deleted rows and their XMIN values no longer exist in the Source table.

  • The XMIN column is not indexed. Therefore, this query mode scans the whole table to identify updated rows, which slows it down and causes processing overhead on your PostgreSQL server.

Due to these limitations, Hevo recommends that you create the Pipeline in the Logical Replication mode, if possible.



SaaS Sources

To extract data from a SaaS Source, Hevo uses the respective Source’s API. Therefore, selection of query mode does not apply to a SaaS Source. The specifications of the Source API affect how data is queried for different objects in the Source.

For example, in case of certain objects, the Source API may allow ingestion of incremental data, that is, any new and updated Event. For other objects, it may allow Hevo to perform only a Full Load of the object, which means that the entire data in the object is replicated on each ingestion. The objects Videos, Channels, and Playlists in YouTube Analytics, and Audience in AdRoll are examples of Full Load objects.

For a few SaaS Sources, to reduce the Events quota consumption of Full Load objects, Hevo filters the data and loads only the new and updated data of the object to the Destination.


See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jul-14-2023 NA Updated section, Unique Incrementing Append Only to add note regarding non-nullable auto-incrementing column.
Jun-28-2023 2.14 Updated section, XMIN to add information about the default query mode selected for Pipelines created for PostgreSQL Sources.
Feb-10-2023 NA Updated section, Full Load to specify how this mode is used for different Sources and ingestion modes.
Feb-07-2023 2.07 Updated section, XMIN for better clarity.
Jan-10-2023 NA Updated section, Delta-Timestamp to add information about using Log-based ingestion to track updates to records in your table in case it does not have a timestamp column.
Nov-08-2022 NA Moved under Ingestion Modes and Query Modes.
Apr-25-2022 NA Updated section, Relational Database Sources.
Apr-11-2022 NA Added section, SaaS Sources to include information about Full Load objects for Saas Sources.
Dec-20-2021 NA Updated section, Full Load to add a note about XMIN, an alternative to Full Load query mode.
- Updated section, XMIN to add a link to the FAQ about changing the query mode.
Nov-09-2021 1.75 Added the section, XMIN, to explain the new XMIN query mode available for PostgreSQL Sources.
Sep-09-2021 1.71 - Renamed Delta - Append Only to Unique Incrementing Append Only as per UI.
- Updated the section, Change Tracking.
- Added examples for all query modes.
Aug-09-2021 NA Updated the section, Change Tracking.
Jun-28-2021 1.61 Updated the Full Load section with information about ingesting more than 5 Million rows.
Jun-01-2021 NA Updated the page overview to explain the Hevo app behavior in case changes are made to the query mode.

Tell us what went wrong