Pipeline Frequency

Last updated on May 30, 2023

Pipeline frequency, or ingestion frequency, is the frequency at which Hevo ingests data from the Source. While only the Events that are loaded to a Destination are counted towards your Events quota consumption, the Pipeline frequency, or ingestion frequency, can have a direct impact on this. The following are some aspects to consider for deciding the optimal ingestion frequency.

Note: You can modify the ingestion frequency at any time post-creation of the Pipeline. Read Scheduling a Pipeline for more information.

Pipeline Frequency and Data Synchronization

The Pipeline frequency does not affect the quota consumed for historical data as all historical loads are free in Hevo. Moreover, the historical data is ingested once when you create the Pipeline and is not ingested on every subsequent run of the Pipeline unless you restart it.

Incremental data ingestion impacts your quota consumption depending on the type of the Source:

  • In case of webhook Sources, Sources using log-based replication, and some SaaS Sources, only new and modified Events are ingested in each run of the Pipeline. Therefore, Pipeline frequency does not impact your quota consumption.

  • In case of some SaaS Sources, where you may end up ingesting duplicate Events based on the Source API’s frequency vs the Pipeline frequency, excessive quota may get consumed.

    Example:

    In Microsoft Ads, the smallest granularity available in the API request is Daily (Hourly is not supported). This means that every time Hevo makes a request to extract the data from Microsoft Ads, it receives the data for the entire day. Thus, Hevo must ingest the incremental data for the entire day on every run of the Pipeline. So, along with the new Events, any Events that were already ingested previously are re-ingested and count towards your quota consumption.

In case of periodic data refresh, which is done for ad-based Sources, the Events ingested during the data refresh are counted towards your Events quota consumption. As data refresh happens with each run of the Pipeline, optimizing the latter’s frequency can impact your quota consumption.


Pipeline Frequency and Business Requirement

The Pipeline frequency you configure must also factor how you want to utilize the data in order to optimize the Events quota consumption. For example, while a low frequency may extend the time your Events quota lasts and generate cost savings, a high frequency may be necessary based on your business needs.

Defining a suitable frequency enables you to optimize the performance of your Pipelines on different parameters such as:

  • Immediacy: A high Pipeline frequency may be useful if you need near-real-time data.

    Example: An organization manages its customer contacts by using the Zendesk omni-channel solution that supports Zendesk Chat, Support, and Talk. The company uses the customer data as soon as it is generated to provide accurate, near-real-time responses and maintain a live dashboard that displays real-time information. In this case, a high Pipeline frequency such as 5 Minutes is recommended.

  • Cost optimization: Aligning the Pipeline frequency as per the Source API frequency can help you reduce the number of re-ingested Events and consume lesser quota.

    Example: Let us say, you have a SaaS Source that gets 1 Million new Events every hour.

    Source API frequency = Daily.

    Scenario 1:

    • Pipeline frequency = 1 Hour.

    • Number of Events ingested on the first Pipeline run of the day= 1 Million.

    • Number of Events ingested on the second run = 2 Million (1 Million new + 1 Million old).

    • Total Events ingested in two hours= 3 Million (1 + 2 Million).

    Scenario 2:

    • Pipeline frequency = 2 Hours.

    • Number of Events ingested on the first Pipeline run of the day = 2 Million.

    • Total Events ingested in two hours = 2 Million.

    So, after 2 Hours:

    Events ingested in Scenario 1 = 3 Million.

    Events ingested in Scenario 2 = 2 Million.

  • Responsiveness: High Pipeline frequency may be needed to provide results and recommendations or gather data in near-real-time.

    Examples:

    • An online news publication wants to personalize content for the user to serve up relevant news stories to increase the time they spend on the website. For this, the company needs to analyze the content a user is currently reading and identify the next articles he/she may be most interested in.
    • A company uses Intercom to keep track of its customers by storing data such as the customers’ name, phone number, and how many times they have visited the company website. This enables the company to send to send targeted messages, and provide meaningful support based on the customers’ behavior, as quickly as possible. A high Pipeline frequency (30 Minutes) is suitable in this case. Having a high frequency also ensures the data generated is not lost before it can be analyzed.
  • Analysis: A low frequency can suffice when the data has to be used for analysis and decisioning over a longer duration.

    Example: A chain of restaurants needs data at the end of every day to calculate the revenue and other financial metrics of their different branches for the day. In this case, high Pipeline frequency is not needed, since there is no need to process the revenue data every few hours. A more efficient option is to set a lower Pipeline frequency, say, 24 Hours, and schedule the Pipeline near to the restaurant’s closing time.

You can modify the schedule post-creation of the Pipeline. Read Scheduling a Pipeline for more information.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Feb-10-2023 NA Merged content of the page, Ingestion Frequency and Data Synchronization into this document.
Nov-10-2021 NA New document.

Tell us what went wrong