Elasticsearch

Last updated on Feb 03, 2023

Elasticsearch is a distributed, RESTful search and analytics engine that centrally stores your data so you can search, index, and analyze data of all shapes and sizes. As Elasticsearch relies on indices to search and fetch documents from your data, it preempts operations that may cause memory issues and stops them with exceptions. Hevo parses some of these exceptions and recommends corrective actions. Read Configuration Changes in Elasticsearch to know about these.

Hevo connects to your Elasticsearch cluster using the Elasticsearch Transport Client and synchronizes the data available in the cluster to your preferred data warehouse using indices. Currently, Hevo supports the following variants:

  • Generic Elasticsearch
  • AWS Elasticsearch

Data Replication

Default Pipeline Frequency Minimum Pipeline Frequency Maximum Pipeline Frequency Custom Frequency Range (Hrs)
15 Mins 15 Mins 24 Hrs 1-24
  • Historical Data: In the first run of the Pipeline, Hevo ingests all the data available in your Elasticsearch database.

  • Incremental Data: Once the historical load is complete, all new and updated data is synchronized with your Destination as per the ingestion frequency.

    Note: A maximum of 500 Events are ingested in each call to the database, to optimize the processing load on your cluster. Contact Hevo Support if you want to modify this limit.


Source Considerations

  • Elasticsearch does not have the capability to expose each document modification. Therefore, to have at least one incrementing column of sortable type, the identity column is used as the tiebreaker if the sortable field is the same for more than one document.

    The _id field created by default is used if none is specified.


Limitations

  • Only Native Realm authentication is supported.

  • Hevo currently does not support deletes. Therefore, any data deleted in the Source may continue to exist in the Destination.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-22-2022 NA Updated section, Limitations to add information about Hevo not capturing deletes.
Aug-24-2022 NA Updated sections, Data Replication and Configure Elasticsearch Connection Settings to restructure the content for better understanding and coherence.
Jun-09-2022 NA Added a reference to the Configuration Changes in Elasticsearch page in the Overview section.
Apr-11-2022 1.86 Added a note in the Connection Settings about setting up a reverse proxy server for connecting to an AWS Elasticsearch Source.
Feb-21-2022 1.82 Added section, (Optional) Connect to Elasticsearch hosted inside a Virtual Private Cloud (VPC)
Jan-03-2022 1.79 Updated the description of the Include New Tables in the Pipeline advance setting in the Configure Elasticsearch Connection Settings section.
Jul-26-2021 1.68 Added a note for the Database Host field.
Jul-12-2021 1.67 Added the field Include New Tables in the Pipeline under Source configuration settings.
Jun-01-2021 1.64 Updated the Configure Elasticsearch Connection Settings section to include the Connect Through HTTPS setting.

Tell us what went wrong