Creating a Database Pipeline

A database Pipeline is a Pipeline that you create with any database (RDBMS or Relational Database Management System) as the Source. For example, MySQL, Oracle, PostgreSQL, and their supported variants are database Sources. You can configure any database or data warehouse as the Destination to load data to, based on your business requirement.

Let us create a Pipeline to move data from a PostgreSQL Source to an Amazon Redshift Destination.


Prerequisite


Configure the Source

  1. Log in to your Hevo account. By default, PIPELINES is selected in the Asset Palette.

  2. Select PostgreSQL from the list of displayed Sources (the list of Sources you select at the time of setting up your account with Hevo).

    Create Pipeline

    If PostgreSQL is not available in the displayed list, click View All, search for PostgreSQL, and then select it from the Select Source Type page.

    Source Type

  3. Select the PostgresSQL variant. Here, we are selecting Generic PostgresSQL.

    Select Postgres Variant

  4. In the Configure your PostgresSQL Source page, specify the following connection settings:

    Note: The fields may vary depending on the Source type. Read Sources.

    Configure Postgres

    • Pipeline Name: A unique name for your Pipeline. For example, PostgreSQL Source 2.

    • Database Host: The PostgreSQL host’s IP address or DNS. For example, demodb.hevo.me.

      The following table lists few examples of PostgreSQL hosts:

      Variant Host
      Amazon RDS PostgreSQL postgresql-rds-1.xxxxx.rds.amazonaws.com
      Azure PostgreSQL postgres.database.azure.com
      Generic PostgreSQL 10.123.10.001 or postgresql.westeros.inc
      Google Cloud PostgreSQL 35.220.150.0

      Note: For URL-based hostnames, exclude the http:// or https:// part. For example, if the hostname URL is https://postgres.database.azure.com, enter postgres.database.azure.com.

    • Database Port: The port number on which your PostgreSQL server listens for connection requests from Hevo. Default value: 5432.

    • Database User: The read-only user who has the permissions to read tables in your database. For example, root.

    • Database Password: The password for the read-only user.

    • Select an Ingestion Mode: The desired mode by which you want to ingest data from the Source. These modes define how Hevo must read data from the database. The available ingestion modes for PostgreSQL are Logical Replication, Table, and Custom SQL.

      • For Ingestion mode as Table, read Object Settings for steps to configure the objects to be replicated.

      • For Ingestion mode as Logical Replication, follow the steps provided in each PostgreSQL variant document to set up logical replication.

      Here, we are selecting Logical Replication to replicate the complete database as is.

    • Database Name: The database that you wish to replicate. For example, demo.

    • Connection Settings:

      • Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your PostgreSQL database host to Hevo. This provides an additional level of security to your database by not exposing your PostgreSQL setup to the public. Read Connecting Through SSH.

        If this option is disabled, you must whitelist Hevo’s IP addresses. Refer to the content for your PostgreSQL variant for steps to do this.

      • Use SSL: Enable it to use SSL encrypted connection. You should also enable this if you are using Heroku PostgreSQL databases. To enable this, specify the following:

        • CA File: The file containing the SSL server certificate authority (CA).

        • Client Certificate: The client public key certificate file.

        • Client Key: The client private key file.

    • Advanced Settings

      • Load Historical Data: Applicable for Pipelines with Logical Replication mode. If this option is enabled, the entire table data is fetched during the first run of the Pipeline. If disabled, Hevo loads only the data that was written in your database after the Pipeline creation.

      • Merge Tables: Applicable for Pipelines with Logical Replication mode. If this option is enabled, Hevo merges tables with the same name from different databases while loading the data to the warehouse. Hevo loads the Database Name field with each record. If disabled, the database name is prefixed to each table name. Read How does the Merge Tables feature work?.

      • Include New Tables in the Pipeline: Applicable for all Ingestion modes except Custom SQL.

        If enabled, Hevo automatically ingests data from tables created in the Source after the Pipeline has been built. These may include completely new tables or previously deleted tables that is re-created in the Source.

        If disabled, new and re-created tables are not ingested automatically. They are added in SKIPPED state in the objects list, in the Pipeline Overview page. You can update their status to INCLUDED to ingest data.

        Note: You can change this setting later.

  5. Click TEST & CONTINUE. The objects available in your database are displayed. If the test fails, fix the errors, re-test and continue with the configuration.


Select Source Objects

  • Select the objects that you want to replicate in your Destination and click CONTINUE. By default, all the objects are selected.

    Note: If your objects or tables do not have a primary key, you need to provide a query mode for your data in the next screen. By default, bulk query is applied to query the data if the query mode is not defined.


Select and Configure the Destination

  1. In the Select Destination Type page, select your Destination. Here, we are selecting Amazon Redshift.

    Select Destination

  2. In the Configure your Amazon Redshift Destination page, specify the Amazon Redshift database settings. Read Destinations.

    Note: The fields may vary depending on the Destination type. Read Destinations.

    Select Destination

  3. Click SAVE & CONTINUE.

  4. In the Destination Table Prefix field, provide a prefix if you want to modify the Destination table or partition name, else, you can leave the field blank.

  5. Click CONTINUE.


Final Settings

  • Optionally, as part of the final settings, you can set up Transformations to cleanse the Source data or enrich it in any other way. You can also view the Source to Destination field mapping via the Schema Mapper and edit these, if needed.

    Your first Pipeline with the database Source is created and the data ingestion starts.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
May-02-2022 NA Updated section, Configure the Source according to the latest UI.
Mar-21-2022 NA New document.
Last updated on 06 May 2022