Loading Events in Batches

For some Destinations, Events may be loaded in batches of files to improve the performance. This is specially applicable to data warehouse Destinations such as the following:

  • Amazon Redshift

  • Google BigQuery

  • Snowflake

  • S3

The writes to the warehouses include scanning of the tables for de-duplication of the Events, which incurs costs for users. Major cloud-based data warehouses, such as, Amazon Redshift and Google BigQuery recommend loading Events through files, in batches. Batches provides much better performance at a much lower cost compared to direct and individual writes to the tables.

Advantages of Loading Events in Batches

  • Batches allows Hevo to loads millions of Events in the warehouse without consuming a lot of resource bandwidth

  • Loading in batches is faster at scale than direct inserts

  • Deduplication needs to be done fewer times for batches as compared to individual records.

Disadvantages of Loading Events in Batches

  • The batching process understandably introduces some delay in loading the data. The delay usually varies between 5-15 minutes. This means that once an Event is ingested by a Hevo Pipeline, and provided it is mapped and does not encounter any other failure, the Event should be visible in the Destination within the above specified time range.

In case you have stricter SLAs in terms of data latency, reach out to Hevo Support over chat and we can guide you on the feasibility and method to configure your Pipeline for lower latency.

Last updated on 24 Aug 2020