On This Page
Data ingestion involves the process of accessing and fetching the data in your Source application or database. Many SaaS Sources provide their own APIs for this purpose.
Hevo supports three ingestion modes: Log-based, Table and Custom SQL. Each mode offers different settings and configurations such as the type of data and query modes that enable you to control the type and amount of data to be ingested:
|Log-based||Data is read from the logs maintained by the Source for each transaction, such as, addition, deletion or update of records.
This mode applies only to database Sources.
|Table||Data is read from tables.|
|Custom SQL||Data is ingested based on the SQL queries provided by user.|
Types of Data
Your Hevo Pipeline fetches the data already present in your Source when you create the Pipeline, along with any new data that is created henceforth. The existing data is called Historical data and the new data is called Incremental data. Hevo uses different Data Synchronization methods to fetch both types of data. For some Sources, Hevo also does a regular data refresh to ensure there is no data loss. You can customize the settings for some of these tasks while configuring the Source during Pipeline creation.
Data Objects and Query Modes
In case of database Sources, you can load data from one, multiple, or all the databases. For each selected object or table, you can specify how the records to be ingested must be queried, through the Query Mode
Similarly, for SaaS Sources, you can select the objects or tables to be fetched or the reports data to be retrieved, and specify the query mode.
At any time, you can include objects you had skipped earlier or vice versa. Read Managing Objects in Pipelines.
In case of webhook Pipelines, if you want to skip some objects or fields, you can do this using Transformations or the Schema Mapper page.
Your Pipeline schedule determines how frequently Hevo ingests the data from the Source. This is also called the Ingestion Frequency. Each Source has a minimum, maximum, and default ingestion frequency. For most Sources, you can change the default frequency to a value based on the custom range available for the Source.
However, Hevo may pause the ingestion briefly in case any ingestion limits imposed by the Source APIs are being breached.
Hevo also provides you the option to prioritize Pipelines, if needed.
Re-ingestion of Data
Re-ingestion of data may be the result of a manual action by you to restart the historical load for an object or a Source limitation.
The log file of a database Source may expire and the log-based Pipeline may be unable to read the data. In such a case, you can restart the historical load for the configured historical sync duration or change the position to load the historical data from a particular record onwards. You can do this for all the objects you selected for ingestion or for specific ones.
For Sources such as Google Sheets, the entire data is re-ingested each time there is a change to the data in the Source, as Google Sheets does not provide a worksheet or row-level timestamp to identify the changes.
Data Ingestion Status
Once you create the Pipeline, you can refer to the Pipeline Activity section of the Pipeline Overview page to see if the ingestion of data is successful at the Pipeline level.
You can also view the ingestion status for each object to know its progress and success. In case Events are not found in the Source system at any time, Hevo may defer the ingestion for a short while. Read Data Ingestion Statuses.
Refer to the following table for the list of key updates made to this page:
|Date||Release||Description of Change|
- Articles in this section
- Ingestion Modes
- Types of Data Synchronization
- Ingestion and Loading Frequency
- Ingestion Frequency and Data Synchronization
- Data Ingestion Statuses
- Deferred Data Ingestion
- Query Modes for Ingesting Data
- Handling of Primary Keys
- Handling of Updates
- Handling of Deletes
- Hevo-generated Metadata