Data Ingestion

Last updated on Nov 28, 2022

Data ingestion involves the process of accessing and fetching the data in your Source application or database. Many SaaS Sources provide their own APIs for this purpose.

Hevo supports three ingestion modes: Log-based, Table and Custom SQL. Each mode offers different settings and configurations such as the type of data and query modes that enable you to control the type and amount of data to be ingested:

Pipeline Mode Description
Log-based Data is read from the logs maintained by the Source for each transaction, such as, addition, deletion or update of records.
This mode applies only to database Sources.
Table Data is read from tables.
Custom SQL Data is ingested based on the SQL queries provided by user.

Types of Data

Your Hevo Pipeline fetches the data already present in your Source when you create the Pipeline, along with any new data that is created henceforth. The existing data is called Historical data and the new data is called Incremental data. Hevo uses different Data Synchronization methods to fetch both types of data. For some Sources, Hevo also does a regular data refresh to ensure there is no data loss. You can customize the settings for some of these tasks while configuring the Source during Pipeline creation.

Data Objects and Query Modes

In case of database Sources, you can load data from one, multiple, or all the databases. For each selected object or table, you can specify how the records to be ingested must be queried, through the Query Mode

Similarly, for SaaS Sources, you can select the objects or tables to be fetched or the reports data to be retrieved, and specify the query mode.

At any time, you can include objects you had skipped earlier or vice versa. Read Managing Objects in Pipelines.

In case of webhook Pipelines, if you want to skip some objects or fields, you can do this using Transformations or the Schema Mapper page.

Ingestion Frequency

Your Pipeline schedule determines how frequently Hevo ingests the data from the Source. This is also called the Ingestion Frequency. Each Source has a minimum, maximum, and default ingestion frequency. For most Sources, you can change the default frequency to a value based on the custom range available for the Source.

However, Hevo may pause the ingestion briefly in case any ingestion limits imposed by the Source APIs are being breached.

Hevo also provides you the option to prioritize Pipelines, if needed.

Re-ingestion of Data

Re-ingestion of data may be the result of a manual action by you to restart the historical load for an object or a Source limitation.

For example:

  • The log file of a database Source may expire and the log-based Pipeline may be unable to read the data. In such a case, you can restart the historical load for the configured historical sync duration or change the position to load the historical data from a particular record onwards. You can do this for all the objects you selected for ingestion or for specific ones.

  • For Sources such as Google Sheets, the entire data is re-ingested each time there is a change to the data in the Source, as Google Sheets does not provide a worksheet or row-level timestamp to identify the changes.

Data Ingestion Status

Once you create the Pipeline, you can refer to the Pipeline Activity section of the Pipeline Overview page to see if the ingestion of data is successful at the Pipeline level.

Ingestion success

You can also view the ingestion status for each object to know its progress and success. In case Events are not found in the Source system at any time, Hevo may defer the ingestion for a short while. Read Data Ingestion Statuses.

Ingestion at object-level

Limiting Data Ingestion

After creating your Pipeline, you can set a daily data ingestion threshold to limit the number of Events ingested and loaded for that Pipeline. This ensures that your Pipeline(s) do not consume any extra Events than the defined threshold and the Events quote is not exhausted. You can also select the option to pause the Pipeline if the number of Events ingested and loaded for your Pipeline exceeds the threshold. Hevo sends you an alert on the UI, email, and Slack (if enabled), after which you can take the desired action. Read Data Spike Alerts for steps to set the threshold and enable the alert.

Note: The option to pause the Pipeline after exceeding the threshold is not available for log-based Pipelines. This is done because pausing the Pipelines may cause the logs to expire, which might lead to loss of data.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Oct-17-2022 1.99 Added section, Limiting Data Ingestion to mention about data spike alerts.
Mar-21-2022 NA New document.

Tell us what went wrong