Amazon Redshift

Last updated on Dec 04, 2023

Amazon Redshift is a fully managed, reliable data warehouse service in the cloud that offers large-scale storage and analysis of data set and performs large-scale database migrations. It is a part of the larger cloud-computing platform Amazon Web Services (AWS).

Hevo can load data from any of your Pipelines into an Amazon Redshift data warehouse. You can set up the Redshift Destination on the fly, as part of the Pipeline creation process, or independently. The ingested data is first staged in Hevo’s S3 bucket before it is batched and loaded to the Amazon Redshift Destination.

If you are new to AWS and Redshift, you can follow the steps listed below to create an AWS account and after that, create an Amazon Redshift database to which the Hevo Pipeline will load the data. Alternatively, you can create users and assign them the required permissions to set up and manage databases within Amazon Redshift. Read AWS Identity and Access Management for more details.

The following image illustrates the key steps that you need to complete to configure Amazon Redshift as a Destination in Hevo:

Redshift Destination Setup


Handling Source Data with Different Data Types

For teams created in or after Hevo Release 1.60, Hevo automatically modifies the data type of an Amazon Redshift Destination table column to accommodate Source data with a different data type. Read Handling Different Data Types in Source Data.

Note: Your Hevo release version is mentioned at the bottom of the Navigation Bar.


Handling Source Data with JSON Fields

For Pipelines created in or after Hevo Release 1.74, Hevo uses Replicate JSON fields to JSON columns as the default parsing strategy to load the Source data to the Amazon Redshift Destination.

With the changed strategy, you can query your JSON data directly, eliminating the need to parse it as JSON strings. This change in strategy does not affect the functionality of your existing Pipelines. Therefore, if you want to apply the changed parsing strategy to your existing Pipelines, you need to recreate them.

In addition, the replication strategies, Flatten structs and split arrays to new Events and Replicate JSON fields as JSON strings and array fields as strings have been deprecated for newer Pipelines, and are no longer visible in the UI.

Read Parsing Nested JSON Fields in Events.


Destination Considerations

  • Amazon Redshift is case insensitive to names of database objects, including tables and columns. For example, if your JSON field names are either in mixed or uppercase, such as Product or ITEMS, Amazon Redshift does not recognize these field names and is hence, unable to fetch data from them. Therefore, to enable Amazon Redshift to identify such JSON field names, you must set the session parameter enable_case_sensitive_identifier to TRUE. Read SUPER configurations.

  • The SUPER data type only supports up to 1 MB of data for an individual SUPER field or object. Read SUPER type - Limitations.

  • Hevo stages the ingested data in an Amazon S3 bucket, from where it is loaded to the Destination tables using the COPY command. Hence, if you have enabled enhanced VPC routing, ensure that your VPC is configured correctly. Enhanced VPC routing affects the way your Amazon Redshift cluster accesses other resources in your AWS network, such as the S3 bucket, specifically for the COPY and UNLOAD commands. Read Enhanced VPC Routing in Amazon Redshift.


Limitations

  • Hevo replicates a maximum of 4096 columns to each Amazon Redshift table. Read Limits on the Number of Destination Columns.

  • Hevo does not support writing to tables that have identity columns.

    Let us suppose you create a table with the default Identity column and manually map a Source table to it. When the Pipeline runs, Hevo would issue insert queries to write all the values from the Source table to this table. However, the writes would fail, since Amazon Redshift does not permit writing values to the identity column.

  • Hevo supports mapping of only JSON fields to the SUPER data type that Amazon Redshift uses to support JSON columns.

    Read SUPER type.



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-04-2023 NA Updated the page contents to reflect the latest Amazon Redshift user interface (UI).
Aug-11-2023 NA Fixed broken links.
Apr-25-2023 2.12 Updated section, Configure Amazon Redshift as a Destination to add information that you must specify all fields to create a Pipeline.
Feb-20-2023 2.08 Updated section, Configure Amazon Redshift as a Destination to add steps for using the connection string to automatically fetch the database credentials.
Oct-10-2022 NA Added sections:
- Set up an Amazon Redshift Instance
- Create a Database
Sep-21-2022 NA Added a note in section, Configure Amazon Redshift as a Destination.
Mar-07-2022 NA Updated the section, Destination Considerations for actions to be taken when Enhanced VPC Routing is enabled.
Feb-07-2022 1.81 Updated section, Whitelist Hevo’s IP Address to remove details about Outbound rules as they are not required.
Nov-09-2021 NA Updated section, Step 2. Create a Database User and Grant Privileges, with the list of commands to be run for granting privileges to the user.
Oct-25-2021 1.74 Added sections:
- Handling Source Data with JSON Fields.
- Destination Considerations.
Updated sections:
- Limitations to add the limitation about Hevo mapping only JSON fields.
- See Also.
Apr-06-2021 1.60 - Added section, Handling Source Data with Different Data Types.
Feb-22-2021 NA - Added the limitation that Hevo does not support writing to tables that have identity columns.
- Updated the page overview to state that the Pipeline stages the ingested data in Hevo’s S3 bucket, from where it is finally loaded to the Destination.
- Revised the procedural sections to include detailed steps for configuring the Amazon Redshift Destination.

Tell us what went wrong