Ingestion and Loading Frequency
On This Page
Hevo Pipelines ingest data from your Sources via the Source Connectors and load it to a Destination of your choice using the Consumers. The Source Connectors run multiple ingestions tasks as per a schedule to fetch data from your Source. This schedule is the Ingestion frequency, and its default value depends on the Source type.
Similarly, to load the ingested data to your Destinations, the Consumers run several tasks on a schedule, termed the Loading frequency. The loading frequency affects all Pipelines using the Destination, and its default value depends on the Destination type.
The process of accessing and fetching data from a Source is termed Data Ingestion. Hevo ingests data in one of the following ways:
Pull-based ingestion: In pull-based ingestion, Hevo’s ingestion task pulls data from your configured Source at a regular interval. This regular interval is termed the ingestion frequency. Hevo fetches data from most SaaS Sources and database Sources configured with the Table or Custom SQL ingestion mode using this technique.
Push-based ingestion: In push-based ingestion, Hevo acts as a receiver and it is the responsibility of the Source to send or post data. Hevo ingests data from webhook Sources using this method.
The ingestion frequency is driven by factors such as the API rate limits, the network throughput, and the Source throughput. As a result, most Sources have a:
Minimum ingestion frequency: The least amount of time that Hevo waits before it polls the Source for new data.
Default ingestion frequency: If not changed, Hevo waits at least for this amount of time to poll the Source for new data.
Maximum ingestion frequency: If not changed, the maximum amount of time that Hevo can wait before polling the Source for new data.
Custom ingestion frequency: You can change the ingestion frequency to a minimum or maximum depending on the allowed range for the Source.
You can change the ingestion frequency for most Sources by changing the Pipeline schedule, except the following:
Webhooks: As these Sources ingest data in real-time, you cannot change the ingestion frequency for them.
Log-based Sources: For RDBMS Sources that use logical replication, Hevo ingests data at the default ingestion frequency. You cannot change the ingestion frequency for these Sources as the database logs retain data only for a certain amount of time. A low ingestion frequency may lead to log expiry.
Kafka: A Kafka Source ingests data in real-time, as a result, you cannot change its ingestion frequency.
SaaS Sources: Some SaaS Sources impose strict API limits to prevent too frequent data reads. As a result, you cannot modify the ingestion frequency for such SaaS Sources.
For all other Pipelines, you can schedule the ingestion to run:
Daily: The ingestion runs as per a fixed schedule every day. For example, you may want to ingest data every two hours, after peak hours.
At a fixed interval: The ingestion may be scheduled to ingest data every n minutes or n hours, where n is an integer value. For example, you may want to ingest data from your Facebook Ads every two hours instead of the default one hour.
The Pipeline ingestion frequency directly impacts the Events quota and consumption. Read Pipeline Frequency.
The process of writing the ingested data to the Destination is termed Data Loading. Hevo loads data to the database Destinations in real-time and for the data warehouse Destinations, it syncs data as per the loading frequency.
In the case of real-time loading, data is read from the messaging queue and written to the Destination tables. This process takes place via in-memory caching. However, the time taken to load the data to the Destination may exceed the retention period of the messaging queues. In that case, data is written to temporary files in Hevo’s Amazon S3 bucket and then loaded to the Destination tables from there.
For each data warehouse Destination, Hevo writes data at a default loading frequency, and this loading frequency is optimized to reduce the cost of synchronizing the data with the Destination tables. The loading task checks for files to be read from the staging location and if there is no new data to be synced, it skips the loading. You can set a high loading frequency to suit your business requirements. However, based on the Destination, increasing the load frequency may increase the cost of your load queries.
Most data warehouse Destinations, except Google BigQuery, which is configured to load data in a streaming mode, allow you to set a custom loading schedule.
You can schedule the data loading to run:
Daily: Set up a fixed daily schedule. For example, you may want to load data every four hours during peak hours, and every hour after peak hours.
At a fixed interval: Schedule the data to be loaded every n minutes or n hours, where n is an integer value. For example, you may want to sync data with your high-availability BigQuery instance every 15 minutes instead of the default one hour.
Effect of Ingestion frequency on Data Loading
The ingestion frequency defined at the Source does not directly affect the Destination or the loading frequency. However, if data is loaded to the Destination at a much slower rate than the rate at which it is ingested, the ingested data is stored at the staging location. In the case of log-based Pipelines as well, when the speed at which the data is written to the Destination is slow, the data remains at the staging location.
Refer to the following table for the list of key updates made to this page:
|Date||Release||Description of Change|