Amazon Redshift is a fully managed, reliable data warehouse service in the cloud that offers large-scale storage and analysis of data set and performs large-scale database migrations. It is a part of the larger cloud-computing platform Amazon Web Services (AWS).
Hevo can load data from any of your Pipelines into an Amazon Redshift data warehouse. You can set up the Redshift Destination on the fly, as part of the Pipeline creation process, or independently. The ingested data is first staged in Hevo’s S3 bucket before it is batched and loaded to the Amazon Redshift Destination.
If you are new to AWS and Redshift, you can follow the steps listed below to create an AWS account and after that, create an Amazon Redshift database to which the Hevo Pipeline will load the data. Alternatively, you can create users and assign them the required permissions to set up and manage databases within Amazon Redshift. Read AWS Identity and Access Management for more details.
Prerequisites
Set up an Amazon Redshift Instance (Optional)
Note: The following steps must be performed by an Admin user or a user with permissions to create an instance on AWS. Permissions and roles are managed through the IAM page in AWS.
1. Log in to your AWS instance
Do one of the following:
2. Connect to Amazon Redshift
-
On the AWS Console Home page, click Services icon at the top left.

-
Select Analytics in the left navigation pane, and then select Amazon Redshift in the right navigation pane.

-
On the Complete sign-up page, click Complete your AWS Registration.
-
Complete the required steps to create your AWS account.
Create a Database (Optional)
The core component of the Amazon Redshift data warehouse is a Cluster. Your databases are created within a cluster.
1. Create a Cluster
-
Log in to the Amazon Redshift dashboard.
-
Click the hamburger menu at the top left corner of the dashboard.

-
In the left navigation pane, click Clusters.

-
In the In my account tab, Clusters section, click Create cluster.

-
On the Create cluster page, Cluster configuration section, specify the following:

-
Cluster identifier: A unique name for your cluster.
- Node type: The primary data processing purpose for which the node is to be used.
- Select RA3 if storage is the priority.
- Select Dense Compute 2 or DC2 if you want to use more of compute capabilities.
- Select RA3 for the best of storage and compute capabilities.
Each node type has a different cost depending on the storage and compute resources, as described in the Configuration summary.
- AZ configuration: The deployment setting for your cluster across different availability zones (AZ).
- Select Single-AZ if you want to deploy your cluster in a single availability zone.
- Select Multi-AZ if you want to deploy your cluster in two availability zones.
-
Number of nodes: The number of nodes, based on the amount of data or computational processing you require. Select 1 to start with. You can increase this value later.
Note: For the RA3 node type, you must select at least two nodes.
-
In the Database configurations section, select Manually add the admin password and specify the following:

-
In the Additional configurations section, you can use the default settings provided by AWS or disable the Use defaults option and specify any additional settings you need.

-
Click Create cluster. You can view the new cluster in the Clusters section. Once the cluster configurations are complete, the status changes to Available.

-
(Optional) Click on the cluster to view its details.
2. Make your Redshift cluster publicly accessible
This is required to be able to connect to your Redshift cluster and create a database.
-
Log in to the Amazon Redshift dashboard.
-
Click the hamburger menu at the top left corner of the dashboard.

-
In the left navigation pane, click Clusters.

-
In the In my account tab, Clusters section, click the cluster you want to connect to Hevo.

-
On the <Cluster Name> page, click the Actions drop-down, and select Modify publicly accessible setting.

-
On the Edit publicly accessible pop-up window, select the Turn on Publicly accessible check box and click Save changes.

You can use any tool that connects to Amazon Redshift, such as DBeaver or Postico, to create your Redshift database, schemas, and tables. These tools allow you to query your database and manage your data efficiently. Alternatively, use the Amazon Redshift Query Editor V2.
Note: The steps in this section have been performed using DBeaver.
-
On <Cluster Name> page, do the following:

-
Click the copy (
) icon corresponding to the Endpoint to copy it.
Remove the :<portnumber>/<databasename> parts from the endpoint to obtain the database host.
For example, if exampleendpoint.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev is the endpoint, use exampleendpoint.abc123xyz789.us-west-2.redshift.amazonaws.com as the database host.
-
Navigate to the Properties tab and copy the Port.
-
Access DBeaver.
-
Click the New Database Connection (
) icon at the top left, and then search and select Redshift.

-
Click Next.
-
In the Connection Settings window, specify the following:

-
Host/Instance: The endpoint obtained in Step 1 above.
-
Port: The port number obtained in Step 1 above.
-
Database: The Redshift database to which you want to connect. The default database created by AWS for Redshift is dev.
-
Username: The admin user you created for the Redshift database.
-
Password: The password for the admin user.
-
Click Finish.
4. Create the Redshift database
-
In the left pane under Database Navigator, right-click your Redshift cluster, and click SQL Editor.

-
Enter the following SQL query to create the database:
create database <database_name>
// Example
CREATE DATABASE test_analytics

-
Press CMD + Enter (Mac) or CTRL + Enter (Windows) to execute the SQL query.
-
In the left pane, under Database Navigator, right-click your Redshift cluster, and click Edit Connection.
-
Update the Database name to the newly created database, and then click OK.
-
In the Connection changed pop-up dialog, click Yes to access the new database.
-
Run the following SQL query to create tables in the database:
CREATE TABLE <table_name> (
column1_name data_type constraints,
column2_name data_type constraints,
...
);
// Example
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(50),
salary DECIMAL(10, 2)
);
-
Run the following SQL query to insert a new row:
INSERT INTO <table_name> (column1_name, column2_name, ...)
VALUES (value1, value2, ...);
-
Run the following SQL query to add a new column to an existing table:
INSERT INTO <table_name> (column1_name, column2_name, ...)
VALUES (value1, value2, ...);
-
The rows and columns subsequently define the schema of the table. Similarly, create all the tables that you require.
5. View the database in your Amazon Redshift cluster
-
Log in to the Amazon Redshift dashboard.
-
Click the hamburger menu at the top left corner of the dashboard.

-
In the left navigation pane, click Clusters.

-
In the In my account tab, Clusters section, click the cluster you want to connect to Hevo.

-
On the <Cluster Name> page, click the Databases tab, and then click Connect to database.

-
On the Connect to database page, specify the following:

-
Connection: Select Create a new connection.
-
Authentication: Select Temporary credentials if you want to generate your access credentials based on your assigned IAM role. Temporary credentials need to be generated each time you connect to the database. To use a previously saved key (password), select AWS Secrets Manager.
-
Database name: The database you are connecting to. For example, test_analytics.
-
Database user: The user you want to connect as. For example, root.
-
Click Connect.
You can access the different tabs of the Database Objects section to view the objects, tables and views within the database:

Allowlist Hevo IP addresses for your region
You need to allowlist the Hevo IP address(es) for your region to enable Hevo to connect to your Amazon Redshift database.
To do this:
-
Log in to the Amazon Redshift dashboard.
-
Click the hamburger menu at the top left corner of the dashboard.

-
In the left navigation pane, click Clusters.

-
In the In my account tab, Clusters section, click the cluster you want to connect to Hevo.

-
On the <Cluster Name> page, click the Properties tab.
-
Scroll down to the Network and security settings section and click the link text under VPC security group.

-
On the Security Groups page, select the check box for your Security group ID, and from the Actions drop-down, click Edit inbound rules.

-
On the Edit inbound rules page:

-
Click Add Rule.
-
In the Type column, select Redshift from the drop-down.
-
In the Source column, select Custom from the drop-down and enter Hevo’s IP addresses for your region. Repeat steps 1-3 to allowlist all the IP addresses.
-
Click Save rules.
Create a Databse User and Grant Privileges
1. Create a databse user (optional)
-
Connect to your Amazon Redshift database as a superuser
or a user with CREATE
privilege, using any SQL client tool, such as DBeaver or Postico.
-
Enter the following command:
CREATE USER <database_username> WITH PASSWORD '<password>';
Note: Replace the placeholder values in the command above with your own. For example, <database_username> with hevo.
2. Grant privileges to the user
-
Log in to your Amazon Redshift database as a superuser
.
-
Enter the following commands:
-
Grant CREATE
privilege to the database user for an existing database:
GRANT CREATE ON DATABASE <database_name> TO <database_username>;
GRANT CREATE ON SCHEMA <schema_name> TO <database_username>;
GRANT USAGE ON SCHEMA <schema_name> TO <database_username>;
-
Grant SELECT
privilege to tables with one of the following commands:
- For all tables in a schema:
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <database_username>;
- For a specific table:
GRANT SELECT ON TABLE <schema_name>.<table_name> TO <database_username>;
Note: Replace the placeholder values in the command above with your own. For example, <database_username> with hevo.
Retrieve the Database Hostname and Port Number (Optional)
-
Log in to the Amazon Redshift dashboard.
-
Click the hamburger menu at the top left corner of the dashboard.

-
In the left navigation pane, click Clusters.

-
In the In my account tab, Clusters section, click the cluster you want to connect to Hevo.

-
On <Cluster Name> page, do the following:

-
Click the copy (
) icon corresponding to the Endpoint to copy it.
Remove the :<portnumber>/<databasename> part from the endpoint to obtain the database cluster identifier, and use it while configuring your Destination in Hevo.
For example, if examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev is the endpoint, use examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com as the identifier.
-
Navigate to the Properties tab and copy the Port.
Perform the following steps to configure Amazon Redshift as a Destination in Hevo:
-
Click DESTINATIONS in the Navigation Bar.
-
Click the Edge tab in the Destinations List View and click + CREATE EDGE DESTINATION.
-
On the Create Destination page, click Amazon Redshift.
-
In the screen that appears, specify the following:

-
Destination Name: A unique name for your Destination, not exceeding 255 characters. For example, Amazon Redshift Destination.
-
In the Connect to your Redshift section:
-
Database Cluster Identifier: Amazon Redshift host’s IP address or DNS name. This is the endpoint that you retrieved in Step 5 above.
-
Database Name: The name of an existing database where the data is to be loaded. This database can be the one you created above or an existing one.
-
Schema Name: The name of the schema in the database to which the user you created for Hevo has access.
-
Database Port: The port on which your Amazon Redshift server listens for connections. Default value: 5439.
-
In the Authentication section:
-
Database User: A user with a non-administrative role created in the Redshift database. This can be the database user that you created above or an existing one.
-
Database Password: The password for the database user specified above.
-
Click TEST & SAVE to test the connection to your Amazon Redshift database.
Once the test is successful, Hevo creates your Amazon Redshift Edge Destination. You can use this Destination while creating your Edge Pipeline.
Read the detailed Hevo documentation for the following related topics:
Modifying Amazon Redshift Destination Configuration
After creating an Amazon Redshift Destination, you can modify the database user and password.
Note: Any configuration changes will affect all the Pipelines configured to use this Destination.
To modify the configuration of your Amazon Redshift Destination:
-
In the detailed view of your Destination, click EDIT.

-
On the <Your Destination Name> page, you can do the following:
-
Enter your new Database User.
-
Click Change and enter your new Database Password.

-
Click TEST & SAVE to check the connection to your Amazon Redshift Destination and then save the modified configuration.
Data Type Mapping
Hevo maps a Source data type internally to a unified data type, referred to as the Hevo Data Type in the table below. This data type is used to represent the Source data from all supported data types in a lossless manner. The Hevo data types are then mapped to the corresponding data types that are supported in each Destination.
Hevo Data Type |
Redshift Data Type |
- ARRAY - JSON - STRUCT |
SUPER |
BOOLEAN |
BOOLEAN |
BYTEARRAY |
VARBYTE |
- BYTE - SHORT |
SMALLINT |
DATE |
DATE |
- DATETIME - TIMESTAMP |
TIMESTAMP |
DECIMAL |
DECIMAL |
DOUBLE |
DOUBLE PRECISION |
FLOAT |
REAL |
INTEGER |
INTEGER |
LONG |
BIGINT |
TIME |
TIME |
- TIMESTAMPTZ - ZONEDDATETIME |
TIMESTAMP_TZ |
TIMETZ |
TIMETZ |
VARCHAR |
VARCHAR |
Destination Considerations
-
Amazon Redshift SUPER
data type is case-insensitive by default. This means that if your JSON field names are in mixed case, uppercase, or lowercase, such as Product, PRODUCT, or product, Amazon Redshift does not differentiate between them within the same object. Therefore, to enable Amazon Redshift to differentiate between such JSON field names, you must set the warehouse parameter enable_case_sensitive_super_attribute
to TRUE. This allows you to query your data in different cases for field names within the same object.
Read SUPER configurations.
-
Hevo converts the table and column names to lowercase during loading. Therefore, if you want to query your Destination irrespective of the case, ensure that the warehouse parameter enable_case_sensitive_identifier
remains set to its default value, FALSE. If this parameter is set to TRUE, you need to use the table and column names in lowercase when querying the Destination.
-
Hevo stages the ingested data in an Amazon S3 bucket, from where it is loaded to the Destination tables using the COPY command. Hence, if you have enabled enhanced VPC routing, ensure that your VPC is configured correctly. Enhanced VPC routing affects the way your Amazon Redshift cluster accesses other resources in your AWS network, such as the S3 bucket, specifically for the COPY and UNLOAD commands. Read Enhanced VPC Routing in Amazon Redshift.
-
Hevo uses the Amazon Redshift COPY command to load data into the Destination tables; however, it has the following limitations:
-
If a Source object has a row size exceeding 4 MB, Hevo cannot load the object, as the COPY command supports a maximum row size of 4 MB. For example, an object having a VARBYTE row with 6 MB of data cannot be loaded, even though the VARBYTE data type supports up to 16 MB. To avoid such a scenario, ensure that each row in your Source objects contains less than 4 MB of data.
-
The command does not support Not a Number (NaN) values in FLOAT and DOUBLE columns, so Hevo loads NaN values in these columns as NULL.
Limitations
-
Hevo replicates a maximum of 1600 columns to each Amazon Redshift table, of which five are Hevo-reserved metadata columns used during data replication. Therefore, your Pipeline can replicate up to 1595 (1600-5) columns for each table.
-
Hevo does not support writing to tables that have IDENTITY columns.
Let us suppose you create a table with a default IDENTITY column and manually map a Source table to it. When the Pipeline runs, Hevo issues insert
queries to write all the values from the Source table to this table. However, the writes would fail, as Amazon Redshift does not permit writing values to the IDENTITY column.
-
Hevo supports mapping of only JSON fields to the SUPER
data type that Amazon Redshift uses to support JSON columns.
Read SUPER type.