Kafka

Kafka is open-source software that provides a framework for storing, reading, and analyzing streaming data.

In Kafka, a Topic is a category or a common name used to store and publish a particular stream of data. Topics in Apache Kafka are similar to tables in a database. Hevo reads your data from the topics created in your Apache Kafka instance.

A collection of topics is called a Kafka Cluster. The server that the topics are hosted on in Kafka is called a Broker. Therefore, a Kafka Cluster may consist of many Kafka Brokers on many servers.

Bootstrap servers are a comma-separated list of host and port pairs that represent the addresses of the brokers. In the context of your Hevo Pipeline, these are the initial servers in the cluster which Hevo can connect to establish the initial connection. If one fails, the others are used. The bootstrap server automatically redirects subsequent connections to appropriate servers in cluster.

Hevo supports two variations of Kafka as a Source. Both these variants offer the same functionality, with Confluent Cloud being the fully-managed version of Apache Kafka.

Click on each link below for steps to configure these as a Source in your Hevo Pipeline:


Data Replication

Default Pipeline Frequency Minimum Pipeline Frequency Maximum Pipeline Frequency
5 Mins 5 Mins 1 Hr

Note: The custom frequency must be set in hours, as an integer value. For example, 1, 2, 3 but not 1.5 or 1.75.

Note: Hevo only supports JSON Format for Kakfa.

  • Incremental Data: Once the Pipeline is created, Hevo fetches new and updated data every five minutes from your Kafka cluster.

If you restart an object via the Pipeline UI, Hevo ingests all the data available at that time in the Source.

For records that are structured as a list of records, Hevo ingests each record as an individual record. Each child record contains a common field called ref_id which is used to indicate a common parent record.


Sample Source Event

[{ "name": "John", "age": 25 }, { "name": "Jack", "occupation": "chef" } ]

Sample Ingested Events

Event 1:

 { "__hevo_id": "abc1", "ref_id": "abcdef", "name": "John", "age": 25 }

Event 2:

 { "__hevo_id": "abc2", "ref_id": "abcdef", "name": "Jack", "occupation": "chef" }


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Oct-25-2021 NA Added the Pipeline frequency information in the Data Replication section.
Last updated on 22 Oct 2021