Databricks

Last updated on May 30, 2023

Databricks is an open-source storage layer that allows you to operate a lakehouse architecture that provides data warehousing performance at data lake cost. Databricks runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Apache Spark is an open source data analytics engine that can perform analytics and data processing on very large sets of data. Read A Gentle Introduction to Apache Spark on Databricks.

Hevo can load data from any of your Sources into Databricks. You can set up the Databricks Destination on the fly, as part of the Pipeline creation process or independently. The ingested data is first staged in Hevo’s S3 bucket before it is batched and loaded to the Databricks Destination. Additionally, Hevo supports Databricks on the AWS, Azure, and GCP platforms.

You can connect your Databricks warehouse to Hevo using one of the following methods:

  • Using the Databricks credentials:
    Hevo allows you to configure Databricks as a Destination using the credentials obtained from your Databricks account. For this, you can use one of the following modes:

    Clusters and SQL warehouses can be created within a workspace. A workspace refers to your Databricks deployment in the cloud service account.

  • Using the Databricks Partner Connect (Recommended Method):
    In collaboration with Databricks, Hevo allows you to configure Databricks as a Destination using the Databricks Partner Connect page. Refer to section, Connect Using the Databricks Partner Connect for the steps to do this.


Destination Considerations

None.


Limitations

  • Hevo currently does not support Databricks as a Destination in the US-GCP region.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Apr-25-2023 2.12 Updated section, Connect Using the Databricks Partner Connect (Recommended Method) to add information that you must specify all fields to create a Pipeline.
Nov-23-2022 2.02 - Added section, Connect Using the Databricks Partner Connect to mention about Databricks Partner Connect integration.
- Updated screenshots in the page to reflect the latest Databricks UI.
Oct-17-2022 NA Updated section, Limitations to add limitation regarding Hevo not supporting Databricks on Google Cloud.
Jan-03-2022 1.79 New document.

Tell us what went wrong