Hevo can replicate tables from your Amazon Redshift database. The replication can happen at any frequency that you would like. Following replication modes are supported:
- Full dump and load - Full tables are replicated at a set frequency.
- Incremental load for append-only data - Only new data is replicated from tables. Suitable for append-only tables.
- Incremental load for mutable data - Updated or new data replicates from tables. Suitable for mutable data.
1. Create a new Pipeline
Click on PIPELINES option in the left navigation bar and click on Create New Pipeline.
2. Select Source Type
Select Amazon Redshift from the list of Select Source Type Screen.
3. Provide Connection Settings
Provide your Redshift database connection details on Redshift Connection Settings page. You will have following options in connection details block:
- Source Name - A unique name for this source
- Database Host - Redshift host's IP address
- Database Port - The port on which your Redshift server is listening for connections (default is 5439 for Redshift)
- Database User - The read-only user that can read the tables in your database.
- Database Password - Password for the read-only user
- Database Name - The database that you wish to replicate
- If you want to connect to Hevo using an SSH server, check How to Connect through SSH. Else, you will have to whitelist Hevo's IP addresses which will be highlighted on the screen, For eg. in this case you will have to whitelist following IP addresses:
Alternatively, you copy details from an existing source of Redshift type. Please note that it will create an independent copy of the selected source.
Click TEST CONNECTION to test the credentials and click CONTINUE once test succeeds.
4. Select Ingestion Mode
On this page, you will have 2 options for Ingestion mode. This will define how you want Hevo to read your data from Redshift source.
- Table: In this mode, your tables will be polled individually at a fixed frequency. You can use this mode when you are looking to fetch data from multiple tables in your database and you would like to have control over ingestion for every table individually.
- Custom SQL: If you are looking to fetch data in a different structure than how it is stored in your source tables, you can use this mode. It will allow you to write a custom query and data will be polled using that query at a fixed frequency.
Select the Ingestion Mode and hit CONTINUE.
4. Provide Job Settings
If you selected Table mode on the last screen, you will be presented with the list of tables in your Redshift database. You can deselect the table that you don't want to replicate. For every table, you will have an option to provide replication mode.
In case of Query, you will be presented with Query editor, where you can write the custom SQL query using which you want to load the data. Select the Replication mode after writing the query.
Replication mode defines how Hevo will read data from your tables or query in every run. You have following options for replication:
- Full Load: Complete result set from the Query or Table will be replicated at a set frequency.
- Delta - Append-only: Specify an auto-increment column (usually the Primary key). Only new results from the Query result set or Table will be replicated on the basis of specified auto-increment id.
- Delta - Timestamp: Specify update timestamp column name in 'Timestamp column' field. New or updated data from Query or Table will be replicated.
- Change Data Capture: This mode will take care of both incrementing id and update timestamp for detecting delta change in data while replicating Query or Table.
After entering the details, click CONTINUE.
5. Select the Destination
Select the Destination where you want to replicate Redshift Data or Click on NEW DESTINATION to create a new Destination. Check out How to add Destination tutorial for the detailed walkthrough on steps needed for adding new Destination.
6. Pipeline Created
Your Pipeline will be created when you enter this page and you will have an option to see Sample Data and Map Schema.
Hevo will try loading your schemas, you can select CONTINUE IN BACKGROUND if it is taking too much time. Click on CREATE SCHEMA MAPPING to map Source and Destination Schemas, check out Introduction to Schema Mapper to learn about Schema Mapper or you can select DO IT LATER to directly head to Pipeline page. You can map schemas later on Schema Mapper page in your pipeline.
Please note that your data will not start replicating in Destination tables until you map source and Destination schemas.