Databricks

Last updated on Jul 22, 2024

Databricks is an open-source storage layer that allows you to operate a data lakehouse architecture. This architecture provides data warehousing performance at data lake costs. Databricks runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Apache Spark is an open-source data analytics engine that can perform analytics and data processing on very large data sets. Read A Gentle Introduction to Apache Spark on Databricks.

Hevo can load data from any of your Sources into a Databricks data warehouse. You can set up the Databricks Destination on the fly, while creating the Pipeline, or independently from the Navigation bar. The ingested data is first staged in Hevo’s S3 bucket before it is batched and loaded to the Databricks Destination. Additionally, Hevo supports Databricks on the AWS, Azure, and GCP platforms.

Hevo supports Databricks on the AWS, Azure, and GCP platforms. You can connect your Databricks warehouse hosted on any of these platforms to Hevo using one of the following methods:

The following image illustrates the key steps that you need to complete to configure Databricks as a Destination in Hevo:

Databricks Destination Setup


Limitations

  • Hevo currently does not support Databricks as a Destination in the US-GCP region.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jul-22-2024 NA Updated section, Connect Using the Databricks Partner Connect to reflect the latest Databricks UI.
Jun-04-2024 NA - Added the process flow diagram in the page overview section.
- Updated section, Configure Databricks as a Destination to clarify the Schema Name field.
Nov-28-2023 NA - Renamed section Connect your Databricks Warehouse to Create a Databricks Cluster or Warehouse.
- Updated section, Obtain Databricks Credentials to add subsections, Obtain cluster details and Obtain SQL warehouse details.
Aug-10-2023 NA - Added a prerequisite about adding Hevo IP addresses to an access list.
- Added the subsection Allow connections from Hevo IP addresses to the Databricks workspace for the steps to create an IP access list.
Apr-25-2023 2.12 Updated section, Connect Using the Databricks Partner Connect (Recommended Method) to add information that you must specify all fields to create a Pipeline.
Nov-23-2022 2.02 - Added section, Connect Using the Databricks Partner Connect to mention about Databricks Partner Connect integration.
- Updated screenshots in the page to reflect the latest Databricks UI.
Oct-17-2022 NA Updated section, Limitations to add limitation regarding Hevo not supporting Databricks on Google Cloud.
Jan-03-2022 1.79 New document.

Tell us what went wrong