Pipeline Frequency and Events Quota Consumption

Pipeline frequency is the frequency at which Hevo ingests data from the Source. It affects how your Events quota is consumed.

In case your Events quota gets exhausted, the Events in the Pipeline are stored in a data warehouse till the time you purchase additional Events, upon which they are replayed. Read the data replication section of your Source to know its replication strategy.


Pipeline Frequency and Data Synchronization Methods

Pipeline frequency does not affect the quota consumed for historical or refresh data ingestion. Historical data is ingested once when you create the Pipeline and is not ingested on every subsequent run of the Pipeline. Similarly, while all the Events ingested during a data refresh count towards your Events quota data, the refresh frequency is different from the frequency of your Pipeline. It varies for every Source; usually it is 24 Hours, and you cannot modify it.

Incremental data ingestion, on the other hand, works differently, depending on the type of the Source:

  • In case of webhook Sources, Sources using log-based replication, and some SaaS Sources, only new and modified Events are ingested in each run of the Pipeline. Therefore, Pipeline frequency does not impact your quota consumption.

  • However, for some SaaS Sources, you may end up ingesting duplicate Events based on the Source API’s frequency vs the Pipeline frequency, leading to excessive quota usage.

    Example:

    In Microsoft Ads, the smallest granularity available in the API request is Daily (Hourly is not supported). This means that every time Hevo makes a request to extract the data from Microsoft Ads, it receives the data for the entire day. Thus, Hevo must ingest the incremental data for the entire day on every run of the Pipeline. So, along with the new Events, any Events that were already ingested previously are re-ingested and count towards your quota consumption.


Selecting the Optimal Pipeline Frequency

You should configure your Pipeline frequency depending on how you want to utilize the data.

For example, while a low frequency may extend the time your Events quota lasts and generate cost savings, a high frequency may be necessary based on your business needs.

Defining a suitable frequency enables you to optimize the performance of your Pipelines on different parameters such as:

  • Immediacy: A high Pipeline frequency may be useful if you need near-real-time data.

    Example: An organization manages its customer contacts by using the Zendesk omni-channel solution that supports Zendesk Chat, Support, and Talk. The company uses the customer data as soon as it is generated to provide accurate, near-real-time responses and maintain a live dashboard that displays real-time information. In this case, a high Pipeline frequency such as 5 Minutes is recommended.

  • Cost optimization: Aligning the Pipeline frequency as per the Source API frequency can help you reduce the number of re-ingested Events and consume lesser quota.

    Example:

    Let us say, you have a SaaS Source that gets 1 Million new Events every hour.

    Source API frequency = Daily.

    Scenario 1:

    • Pipeline frequency = 1 Hour.

    • Number of Events ingested on the first Pipeline run of the day= 1 Million.

    • Number of Events ingested on the second run = 2 Million (1 Million new + 1 Million old).

    • Total Events ingested in two hours= 3 Million (1 + 2 Million).

    Scenario 2:

    • Pipeline frequency = 2 Hours.

    • Number of Events ingested on the first Pipeline run of the day = 2 Million.

    • Total Events ingested in two hours = 2 Million.

    So, after 2 Hours:

    Events ingested in Scenario 1 = 3 Million.

    Events ingested in Scenario 2 = 2 Million.

  • Responsiveness: High Pipeline frequency may be needed to provide results and recommendations or gather data in near-real-time.

    Example:

    • An online news publication wants to personalize content for the user to serve up relevant news stories to increase the time they spend on the website. For this, the company needs to analyze the content a user is currently reading and identify the next articles he/she may be most interested in.
    • A company uses Intercom to keep track of its customers by storing data such as the customers’ name, phone number, and how many times they have visited the company website. This enables the company to send to send targeted messages, and provide meaningful support based on the customers’ behavior, as quickly as possible. A high Pipeline frequency (30 Minutes) is suitable in this case. Having a high frequency also ensures the data generated is not lost before it can be analyzed.
  • Analysis: A low frequency can suffice when the data has to be used for analysis and decisioning over a longer duration.

    Example: A chain of restaurants needs data at the end of every day to calculate the revenue and other financial metrics of their different branches for the day. In this case, high Pipeline frequency is not needed, since there is no need to process the revenue data every few hours. A more efficient option is to set a lower Pipeline frequency, say, 24 Hours, and schedule the Pipeline near to the restaurant’s closing time.

You can modify the schedule post-creation of the Pipeline. Read Scheduling a Pipeline for more information.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-10-2021 NA New document.
Last updated on 19 Nov 2021