PostgreSQL

Hevo supports the following variations of PostgreSQL as a Source:

Click each variant above to know the prerequisites and pre-configuration setups specific to it. Once you complete the setups, follow the steps mentioned below to configure it as a Source in your Hevo Pipeline and start ingesting data.


Select the Source Type

To select PostgreSQL as the Source:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select the PostgreSQL variant.


Specify PostgreSQL Connection Settings

  1. In the Configure your PostgreSQL Source page, specify the following:

    PostgreSQL settings

    • Pipeline Name: A unique name for your Pipeline.

    • Database Host: The PostgreSQL host’s IP address or DNS.

      The following table lists few examples of PostgreSQL hosts:

      Variant Host
      Amazon RDS PostgreSQL postgresql-rds-1.xxxxx.rds.amazonaws.com
      Azure PostgreSQL postgres.database.azure.com
      Generic PostgreSQL 10.123.10.001 or postgresql.westeros.inc
      Google Cloud PostgreSQL 35.220.150.0

      Note: For URL-based hostnames, exclude the http:// or https:// part. For example, if the hostname URL is https://postgres.database.azure.com, enter postgres.database.azure.com.

    • Database Port: The port on which your PostgreSQL server is listening for connections. Default value: 5432.

    • Database User: The read-only user who has the permissions to read tables in your database.

    • Database Password: The password for the read-only user.

    • Select an Ingestion Mode: The desired mode by which you want to ingest data from the Source. The available Ingestion Modes are Logical Replication, Table, and Custom SQL. Read Ingestion Modes.

      • For Ingestion mode as Table, Refer to section, Object Settings for steps to configure the objects to be replicated.

      • For Ingestion mode as Logical Replication, follow the steps provided in each PostgreSQL variant document to set up logical replication.

      Note: PostgreSQL does not support logical replication for the read replica.

    • Database Name: The database that you wish to replicate.

    • Connection Settings:

      • Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your PostgreSQL database host to Hevo. This provides an additional level of security to your database by not exposing your PostgreSQL setup to the public. Read Connecting Through SSH.

        If this option is disabled, you must whitelist Hevo’s IP addresses. Refer to the content for your PostgreSQL variant for steps to do this.

      • Use SSL: Enable it to use SSL encrypted connection. You should also enable this if you are using Heroku PostgreSQL databases. To enable this, specify the following:

        • CA File: The file containing the SSL server certificate authority (CA).

        • Client Certificate: The client public key certificate file.

        • Client Key: The client private key file.

    • Advanced Settings

      • Load Historical Data: Applicable for Pipelines with Logical Replication mode. If this option is enabled, the entire table data is fetched during the first run of the Pipeline. If disabled, Hevo loads only the data that was written in your database after the time of creation of the Pipeline.

      • Merge Tables: Applicable for Pipelines with Logical Replication mode. If this option is enabled, Hevo merges tables with the same name from different databases while loading the data to the warehouse. Hevo loads the Database Name field with each record. If disabled, the database name is prefixed to each table name. Read How does the Merge Tables feature work?.

      • Include New Tables in the Pipeline: Applicable for all Ingestion modes except Custom SQL.

      If enabled, Hevo automatically ingests data from tables created in the Source after the Pipeline has been built. These may include completely new tables or previously deleted tables that have been re-created in the Source.

      If disabled, new and re-created tables are not ingested automatically. They are added in SKIPPED state in the objects list, in the Pipeline Overview page. You can update their status to INCLUDED to ingest data.

      You can change this setting later.

  2. Click TEST & CONTINUE to proceed for setting up the Destination.


Object Settings

Object settings must be configured if the Ingestion mode is Table.

To do this:

  • Once your respective Source connection settings are specified in Step 2 above, select the objects to be replicated in the SELECT the Objects you want to replicate page, and then click CONTINUE.

    Object Selection

    Note: Each object represents a table in your database.

  • In the CONFIGURE SOURCE OBJECTS page, specify the query mode to be used for each selected object.


Source Considerations

Pausing the Pipeline

For Pipelines created with Logical Replication ingestion mode, Hevo replicates the data using the log generated by the Source. Pausing a log-based Pipeline for more than 24 hours may lead to data loss, as a result of the log being deleted. The log can get deleted due to the expiry of its retention period or limited disk storage space in the case of large log files.

In case there is a loss of data after resuming a paused Pipeline, restart the historical load for all the objects to ingest the lost data. To do so, in the Pipeline Overview page:

  1. Select the Objects check box to select all the objects in the Pipeline. You can also select specific objects by selecting the check box next to their names.

    Restart historical load for the Pipeline

  2. Select the Restart option from the MORE drop-down to start the historical data ingestion.

The historical load starts immediately. The re-ingested data does not count towards your quota consumption and is not billed.

Deleting the Pipeline

For Pipelines with Logical Replication as the Pipeline mode, Hevo creates a replication slot in the Source to record any changes. If you delete the Pipeline, Hevo automatically drops this replication slot. However, sometimes, due to issues at the PostgreSQL Source end, the slot might not get dropped automatically. If that happens, use the following command to manually drop the slot:

SELECT pg_drop_replication_slot('<slot_name>');

For example, to drop a slot named test_postgreSQL_slot, use the following query:

SELECT pg_drop_replication_slot('test_postgreSQL_slot');



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Apr-21-2022 1.86 - Updated sections, Specify PostgreSQL Connection Settings and Object Settings.
- Removed section, Select the Pipeline Mode.
Mar-07-2022 1.83 Updated the section, Select the Pipeline Mode with a note about the logical replication.
Jan-24-2022 1.80 Added information about the impact of pausing a Pipeline in the Source Considerations section.
Jan-03-2022 1.79 Updated the description of the Include New Tables in the Pipeline advance setting in the Specify PostgreSQL Connection Settings section.
Dec-06-2021 1.77 Added section, Source Considerations.
Jul-26-2021 1.68 Added a note for the Database Host field.
Jul-12-2021 1.67 Added the field Include New Tables in the Pipeline under Source configuration settings.
Last updated on 05 May 2022