Hevo can replicate tables from your Postgres database, either hosted by you or in Amazon RDS. The replication can happen at any frequency that you would like. Following replication modes are supported:
- Full dump and load - Full tables are replicated at a set frequency.
- Incremental load for append-only data - Only new rows are replicated from tables. Suitable for append-only tables
- Incremental load for mutable data - Updated or new rows are replicates from tables. Suitable for mutable data.
- Write Ahead Log(WAL) - Replication is performed using Postgres's Write Ahead Log set at a logical level.
1. Create a new Pipeline
Click on PIPELINES option in the left navigation bar and click on Create New Pipeline.
2. Select Source Type
Select Postgres from the list on Select Source Type Screen.
3. Select Ingestion Mode
On this page, you will have 2 options for Ingestion mode. This will define how you want Hevo to read your data from Postgres source.
- Table: In this mode, your tables will be polled individually at a fixed frequency. You can use this mode when you are looking to fetch data from multiple tables in your database and you would like to have control over ingestion for every table individually.
- Custom SQL: If you are looking to fetch data in a different structure than how it is stored in your source tables, you can use this mode. It will allow you to write a custom query and data will be polled using that query at a fixed frequency.
- Write Ahead Log(WAL): In this mode, data will be polled using Postgres Write Ahead Log(WAL) set at a logical level(available on Postgres version 9.4 and above). This mode is useful when you are looking to replicate the complete database as it is. This mode is very efficient in replicating but leaves you with less control and manageability over data ingestion. Read about the instructions to set up the WAL here.
Select the Ingestion Mode and hit CONTINUE.
4. Provide Connection Settings
Provide your Postgres database connection details on Postgres Connection Settings page. You will have the following options in the connection details block:
- Postgres Host - Postgres host's IP address or DNS
- Postgres Port - The port on which your Postgres server is listening for connections (default is 5432)
- Postgres User - The read-only user that can read the tables in your database.
- Postgres Password - Password for the read-only user
- Database Name - The database that you wish to replicate
- If you want to connect to Hevo using an SSH tunnel, check How to Connect through SSH. Else, you will have to whitelist Hevo's IP addresses as listed here.
- Schema Name - The name of the Schema in the Database
- Replicate JSON fields as Strings - If disabled Hevo will replicate JSON fields into multiple tables depending on the level of nesting while keeping the reference to the parent row.
- Connect through SSH - If you want to connect to Hevo using an SSH tunnel, check How to Connect through SSH. Also, you will have to whitelist Hevo's IP addresses as listed here.
- Use SSL - Enable it to use SSL encrypted connection. You should also enable this if you are using Heroku Postgres databases.
Alternatively, you copy details from an existing source of Postgres type. Please note that it will create an independent copy of the selected source.
Click TEST CONNECTION to test the credentials and click CONTINUE once test succeeds.
5. Provide Job Settings
You will land on this screen if you selected Table or Custom SQL mode on the last screen.
In the case of Table mode, you will be presented with the list of tables in your Postgres database. You can deselect the table that you don't want to replicate. For every table, you will have an option to provide replication mode.
In case of Query mode, you will be presented with Query editor, where you can write the custom SQL query using which you want to load the data. Select the Replication mode after writing the query.
Replication mode defines how Hevo will read data from your tables or query in every run. You have the following options for replication:
- Full Load: Complete result set from the Query or Table will be replicated at a set frequency.
- Delta - Append-only: Specify an auto-increment column (usually the Primary key). Only new results from the Query result set or Table will be replicated on the basis of specified auto-increment id.
- Delta - Timestamp: Specify update timestamp column name in 'Timestamp column' field. New or updated data from the Query or Table will be replicated.
- Change Data Capture: This mode will take care of both incrementing id and update timestamp for detecting delta change in data while replicating Query or Table.
After entering the details, click CONTINUE.
6. Select the Destination
Select the Destination where you want to replicate Postgres Data or Click on NEW DESTINATION to create a new Destination. Check out How to add Destination tutorial for the detailed walkthrough on steps needed for adding new Destination.
7. Pipeline Created
Your Pipeline will be created when you enter this page and you will have an option to see Sample Data and Map Schema.
While Hevo tries to load your schemas, you can select CONTINUE IN BACKGROUND if it is taking too much time. Click on CREATE SCHEMA MAPPING to map Source and Destination Schemas, check out Introduction to Schema Mapper to learn about Schema Mapper or you can select DO IT LATER to directly head to Pipeline page. You can map schemas later on Schema Mapper page in your pipeline.
Please note that your data will not start replicating in Destination tables until you map source and Destination schemas.