- Introduction
- Getting Started
- Creating an Account in Hevo
- Subscribing to Hevo via AWS Marketplace
-
Connection Options
- Connecting Through SSH
- Connecting Through Reverse SSH Tunnel
- Connecting Through VPN
- Connecting Through Mongo PrivateLink
- Connecting Through AWS Transit Gateway
- Connecting Through AWS VPC Endpoint
- Connecting Through AWS VPC Peering
- Using Google Account Authentication
- How Hevo Authenticates Sources and Destinations using OAuth
- Reauthorizing an OAuth Account
- Familiarizing with the UI
- Creating your First Pipeline
- Data Loss Prevention and Recovery
- Data Ingestion
- Data Loading
- Loading Data in a Database Destination
- Loading Data to a Data Warehouse
- Optimizing Data Loading for a Destination Warehouse
- Deduplicating Data in a Data Warehouse Destination
- Manually Triggering the Loading of Events
- Scheduling Data Load for a Destination
- Loading Events in Batches
- Data Loading Statuses
- Data Spike Alerts
- Name Sanitization
- Table and Column Name Compression
- Parsing Nested JSON Fields in Events
- Pipelines
- Data Flow in a Pipeline
- Familiarizing with the Pipelines UI
- Working with Pipelines
- Managing Objects in Pipelines
- Pipeline Jobs
-
Transformations
-
Python Code-Based Transformations
- Supported Python Modules and Functions
-
Transformation Methods in the Event Class
- Create an Event
- Retrieve the Event Name
- Rename an Event
- Retrieve the Properties of an Event
- Modify the Properties for an Event
- Fetch the Primary Keys of an Event
- Modify the Primary Keys of an Event
- Fetch the Data Type of a Field
- Check if the Field is a String
- Check if the Field is a Number
- Check if the Field is Boolean
- Check if the Field is a Date
- Check if the Field is a Time Value
- Check if the Field is a Timestamp
-
TimeUtils
- Convert Date String to Required Format
- Convert Date to Required Format
- Convert Datetime String to Required Format
- Convert Epoch Time to a Date
- Convert Epoch Time to a Datetime
- Convert Epoch to Required Format
- Convert Epoch to a Time
- Get Time Difference
- Parse Date String to Date
- Parse Date String to Datetime Format
- Parse Date String to Time
- Utils
- Examples of Python Code-based Transformations
-
Drag and Drop Transformations
- Special Keywords
-
Transformation Blocks and Properties
- Add a Field
- Change Datetime Field Values
- Change Field Values
- Drop Events
- Drop Fields
- Find & Replace
- Flatten JSON
- Format Date to String
- Format Number to String
- Hash Fields
- If-Else
- Mask Fields
- Modify Text Casing
- Parse Date from String
- Parse JSON from String
- Parse Number from String
- Rename Events
- Rename Fields
- Round-off Decimal Fields
- Split Fields
- Examples of Drag and Drop Transformations
- Effect of Transformations on the Destination Table Structure
- Transformation Reference
- Transformation FAQs
-
Python Code-Based Transformations
-
Schema Mapper
- Using Schema Mapper
- Mapping Statuses
- Auto Mapping Event Types
- Manually Mapping Event Types
- Modifying Schema Mapping for Event Types
- Schema Mapper Actions
- Fixing Unmapped Fields
- Resolving Incompatible Schema Mappings
- Resizing String Columns in the Destination
- Schema Mapper Compatibility Table
- Limits on the Number of Destination Columns
- File Log
- Troubleshooting Failed Events in a Pipeline
- Mismatch in Events Count in Source and Destination
- Audit Tables
- Activity Log
-
Pipeline FAQs
- Can multiple Sources connect to one Destination?
- What happens if I re-create a deleted Pipeline?
- Why is there a delay in my Pipeline?
- Can I change the Destination post-Pipeline creation?
- Why is my billable Events high with Delta Timestamp mode?
- Can I drop multiple Destination tables in a Pipeline at once?
- How does Run Now affect scheduled ingestion frequency?
- Will pausing some objects increase the ingestion speed?
- Can I see the historical load progress?
- Why is my Historical Load Progress still at 0%?
- Why is historical data not getting ingested?
- How do I set a field as a primary key?
- How do I ensure that records are loaded only once?
- Events Usage
- Sources
- Free Sources
-
Databases and File Systems
- Data Warehouses
-
Databases
- Connecting to a Local Database
- Amazon DocumentDB
- Amazon DynamoDB
- Elasticsearch
-
MongoDB
- Generic MongoDB
- MongoDB Atlas
- Support for Multiple Data Types for the _id Field
- Example - Merge Collections Feature
-
Troubleshooting MongoDB
-
Errors During Pipeline Creation
- Error 1001 - Incorrect credentials
- Error 1005 - Connection timeout
- Error 1006 - Invalid database hostname
- Error 1007 - SSH connection failed
- Error 1008 - Database unreachable
- Error 1011 - Insufficient access
- Error 1028 - Primary/Master host needed for OpLog
- Error 1029 - Version not supported for Change Streams
- SSL 1009 - SSL Connection Failure
- Troubleshooting MongoDB Change Streams Connection
- Troubleshooting MongoDB OpLog Connection
-
Errors During Pipeline Creation
- SQL Server
-
MySQL
- Amazon Aurora MySQL
- Amazon RDS MySQL
- Azure MySQL
- Generic MySQL
- Google Cloud MySQL
- MariaDB MySQL
-
Troubleshooting MySQL
-
Errors During Pipeline Creation
- Error 1003 - Connection to host failed
- Error 1006 - Connection to host failed
- Error 1007 - SSH connection failed
- Error 1011 - Access denied
- Error 1012 - Replication access denied
- Error 1017 - Connection to host failed
- Error 1026 - Failed to connect to database
- Error 1027 - Unsupported BinLog format
- Failed to determine binlog filename/position
- Schema 'xyz' is not tracked via bin logs
- Errors Post-Pipeline Creation
-
Errors During Pipeline Creation
- MySQL FAQs
- Oracle
-
PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon RDS PostgreSQL
- Azure PostgreSQL
- Generic PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
-
Troubleshooting PostgreSQL
-
Errors during Pipeline creation
- Error 1003 - Authentication failure
- Error 1006 - Connection settings errors
- Error 1011 - Access role issue for logical replication
- Error 1012 - Access role issue for logical replication
- Error 1014 - Database does not exist
- Error 1017 - Connection settings errors
- Error 1023 - No pg_hba.conf entry
- Error 1024 - Number of requested standby connections
- Errors Post-Pipeline Creation
-
Errors during Pipeline creation
- PostgreSQL FAQs
- Troubleshooting Database Sources
- File Storage
- Engineering Analytics
- Finance & Accounting Analytics
-
Marketing Analytics
- ActiveCampaign
- AdRoll
- Amazon Ads
- Apple Search Ads
- AppsFlyer
- CleverTap
- Criteo
- Drip
- Facebook Ads
- Facebook Page Insights
- Firebase Analytics
- Freshsales
- Google Ads
- Google Analytics
- Google Analytics 4
- Google Analytics 360
- Google Play Console
- Google Search Console
- HubSpot
- Instagram Business
- Klaviyo v2
- Lemlist
- LinkedIn Ads
- Mailchimp
- Mailshake
- Marketo
- Microsoft Ads
- Onfleet
- Outbrain
- Pardot
- Pinterest Ads
- Pipedrive
- Recharge
- Segment
- SendGrid Webhook
- SendGrid
- Salesforce Marketing Cloud
- Snapchat Ads
- SurveyMonkey
- Taboola
- TikTok Ads
- Twitter Ads
- Typeform
- YouTube Analytics
- Product Analytics
- Sales & Support Analytics
- Source FAQs
- Destinations
- Familiarizing with the Destinations UI
- Cloud Storage-Based
- Databases
-
Data Warehouses
- Amazon Redshift
- Amazon Redshift Serverless
- Azure Synapse Analytics
- Databricks
- Firebolt
- Google BigQuery
- Hevo Managed Google BigQuery
- Snowflake
-
Destination FAQs
- Can I change the primary key in my Destination table?
- How do I change the data type of table columns?
- Can I change the Destination table name after creating the Pipeline?
- How can I change or delete the Destination table prefix?
- Why does my Destination have deleted Source records?
- How do I filter deleted Events from the Destination?
- Does a data load regenerate deleted Hevo metadata columns?
- How do I filter out specific fields before loading data?
- Transform
- Alerts
- Account Management
- Activate
- Glossary
Releases- Release 2.30 (Oct 21-Nov 18, 2024)
- Release 2.29 (Sep 30-Oct 22, 2024)
-
2024 Releases
- Release 2.28 (Sep 02-30, 2024)
- Release 2.27 (Aug 05-Sep 02, 2024)
- Release 2.26 (Jul 08-Aug 05, 2024)
- Release 2.25 (Jun 10-Jul 08, 2024)
- Release 2.24 (May 06-Jun 10, 2024)
- Release 2.23 (Apr 08-May 06, 2024)
- Release 2.22 (Mar 11-Apr 08, 2024)
- Release 2.21 (Feb 12-Mar 11, 2024)
- Release 2.20 (Jan 15-Feb 12, 2024)
-
2023 Releases
- Release 2.19 (Dec 04, 2023-Jan 15, 2024)
- Release Version 2.18
- Release Version 2.17
- Release Version 2.16 (with breaking changes)
- Release Version 2.15 (with breaking changes)
- Release Version 2.14
- Release Version 2.13
- Release Version 2.12
- Release Version 2.11
- Release Version 2.10
- Release Version 2.09
- Release Version 2.08
- Release Version 2.07
- Release Version 2.06
-
2022 Releases
- Release Version 2.05
- Release Version 2.04
- Release Version 2.03
- Release Version 2.02
- Release Version 2.01
- Release Version 2.00
- Release Version 1.99
- Release Version 1.98
- Release Version 1.97
- Release Version 1.96
- Release Version 1.95
- Release Version 1.93 & 1.94
- Release Version 1.92
- Release Version 1.91
- Release Version 1.90
- Release Version 1.89
- Release Version 1.88
- Release Version 1.87
- Release Version 1.86
- Release Version 1.84 & 1.85
- Release Version 1.83
- Release Version 1.82
- Release Version 1.81
- Release Version 1.80 (Jan-24-2022)
- Release Version 1.79 (Jan-03-2022)
-
2021 Releases
- Release Version 1.78 (Dec-20-2021)
- Release Version 1.77 (Dec-06-2021)
- Release Version 1.76 (Nov-22-2021)
- Release Version 1.75 (Nov-09-2021)
- Release Version 1.74 (Oct-25-2021)
- Release Version 1.73 (Oct-04-2021)
- Release Version 1.72 (Sep-20-2021)
- Release Version 1.71 (Sep-09-2021)
- Release Version 1.70 (Aug-23-2021)
- Release Version 1.69 (Aug-09-2021)
- Release Version 1.68 (Jul-26-2021)
- Release Version 1.67 (Jul-12-2021)
- Release Version 1.66 (Jun-28-2021)
- Release Version 1.65 (Jun-14-2021)
- Release Version 1.64 (Jun-01-2021)
- Release Version 1.63 (May-19-2021)
- Release Version 1.62 (May-05-2021)
- Release Version 1.61 (Apr-20-2021)
- Release Version 1.60 (Apr-06-2021)
- Release Version 1.59 (Mar-23-2021)
- Release Version 1.58 (Mar-09-2021)
- Release Version 1.57 (Feb-22-2021)
- Release Version 1.56 (Feb-09-2021)
- Release Version 1.55 (Jan-25-2021)
- Release Version 1.54 (Jan-12-2021)
-
2020 Releases
- Release Version 1.53 (Dec-22-2020)
- Release Version 1.52 (Dec-03-2020)
- Release Version 1.51 (Nov-10-2020)
- Release Version 1.50 (Oct-19-2020)
- Release Version 1.49 (Sep-28-2020)
- Release Version 1.48 (Sep-01-2020)
- Release Version 1.47 (Aug-06-2020)
- Release Version 1.46 (Jul-21-2020)
- Release Version 1.45 (Jul-02-2020)
- Release Version 1.44 (Jun-11-2020)
- Release Version 1.43 (May-15-2020)
- Release Version 1.42 (Apr-30-2020)
- Release Version 1.41 (Apr-2020)
- Release Version 1.40 (Mar-2020)
- Release Version 1.39 (Feb-2020)
- Release Version 1.38 (Jan-2020)
- Early Access New
- Upcoming Features
Amazon Redshift
On This Page
- Prerequisites
- Set up an Amazon Redshift Instance (Optional)
- Create a Database (Optional)
- Retrieve the Hostname and Port Number (Optional)
- Whitelist Hevo’s IP Addresses
- Create a User and Grant Privileges
- Configure Amazon Redshift as a Destination
- Handling Source Data with Different Data Types
- Handling Source Data with JSON Fields
- Destination Considerations
- Limitations
- Revision History
Amazon Redshift is a fully managed, reliable data warehouse service in the cloud that offers large-scale storage and analysis of data set and performs large-scale database migrations. It is a part of the larger cloud-computing platform Amazon Web Services (AWS).
Hevo can load data from any of your Pipelines into an Amazon Redshift data warehouse. You can set up the Redshift Destination on the fly, as part of the Pipeline creation process, or independently. The ingested data is first staged in Hevo’s S3 bucket before it is batched and loaded to the Amazon Redshift Destination.
If you are new to AWS and Redshift, you can follow the steps listed below to create an AWS account and after that, create an Amazon Redshift database to which the Hevo Pipeline will load the data. Alternatively, you can create users and assign them the required permissions to set up and manage databases within Amazon Redshift. Read AWS Identity and Access Management for more details.
The following image illustrates the key steps that you need to complete to configure Amazon Redshift as a Destination in Hevo:
Prerequisites
-
The database hostname and port number of the Amazon Redshift instance are available.
-
You are assigned the Team Collaborator or any administrator role except the Billing Administrator role in Hevo to create the Destination.
Set up an Amazon Redshift Instance (Optional)
Note: The following steps must be performed by an Admin user or a user with permissions to create an instance on AWS. Permissions and roles are managed through the IAM page in AWS.
1. Log in to your AWS instance
Do one of the following:
-
Create an AWS account
Note: You can also use the AWS Free Tier that offers you a two month free trial of Amazon Redshift.
-
Go to aws.amazon.com and click Create an AWS account.
-
On the Sign up for AWS page, specify the following:
-
Root user email address: A valid email address that can be used to manage the AWS account. The root user serves as the account admin.
-
AWS account name: A unique name for your AWS account. You can change this later.
-
-
Click Verify email address and check for a verification email in your inbox.
-
-
Sign-in to an existing AWS instance
-
Go to https://aws.amazon.com/console and click Sign in to the Console.
-
Select the Root user option to log in as the account admin (root user) or the IAM user to log in as a user with IAM role-based permissions. The account admin may create roles based on AWS’ IAM policies to define permissions on a database, and assign these to users.
-
Click Next.
-
Provide the root user password.
-
Click Sign In.
-
2. Connect to Amazon Redshift
-
In the AWS Console Home, click Services in the top left.
-
Select Analytics, Amazon Redshift.
-
On the Complete Sign-up page, click Complete your AWS Registration.
-
Complete the required steps to create your AWS account.
Create a Database (Optional)
The core component of the Amazon Redshift data warehouse is a Cluster. Your databases are created within a cluster. Read Structure of Data in the Amazon Redshift Data Warehouse.
1. Create a Cluster
-
In the left navigation bar of your Amazon Redshift dashboard, click Clusters, Create Cluster.
-
In the Create cluster, Cluster configuration section, specify the following:
-
Cluster identifier: A unique name for your cluster.
- AZ configuration: The deployment setting for your cluster across different availability zones (AZ).
- Select Single-AZ if you want to deploy your cluster in a single availability zone.
- Select Multi-AZ if you want to deploy your cluster in two availability zones.
- Node type: The primary data processing purpose for which the node is to be used.
- Select RA3 if storage is the priority.
- Select Dense Compute 2 or DC2 if you want to use more of compute capabilities.
- Select RA3 for the best of storage and compute capabilities.
Each node type has a different cost depending on the storage and compute resources, as described in the Configuration summary.
-
Number of nodes: The number of nodes, based on the amount of data or computational processing you require. Select 1 to start with. You can increase this value later.
Note: For the RA3 node type, you must select at least two nodes.
-
-
In the Database Configurations section, specify the following:
-
Admin user name: A unique name for the admin of the database you plan to create.
-
Admin user password: The password for the admin user.
-
-
In the Additional configurations section, specify any additional settings you need or enable the Use defaults option to use the default settings provided by AWS.
-
Click Create cluster. You can view the new cluster in the Amazon Redshift, Clusters page. Once the cluster configurations are complete, the status changes to Available.
-
(Optional) Click on the cluster to view its details.
2. Make your Redshift cluster publicly accessible
This is required to be able to connect to your Redshift cluster and create a database.
-
In your AWS Console, go to Amazon Redshift, Clusters, and click on the cluster name.
-
On the cluster details page, click the Actions menu and select Modify publicly accessible.
-
On the Edit publicly accessible page, select the Turn on Publicly accessible check box and click Save changes.
3. Connect to the Redshift Warehouse from an external tool
You can use any tool that can connect to Amazon Redshift, for example, Postico or pgAdmin, to create your Redshift database. Alternatively, use the Amazon Redshift Query Editor V2.
Note: The steps in this section have been performed using Postico.
-
From the Redshift cluster details page, copy the Endpoint.
-
Access the external tool (Postico) and specify the following details:
-
Nickname: A unique, short name for this connection.
-
Host: The Endpoint obtained from your Redshift cluster details page.
Note: Do not include the port information in the host name.
-
User: The admin user for the database. Enter root.
-
Password: The password for the root user.
-
Database: The default database Portico opens in. Enter dev.
-
-
Click Connect.
4. Create the Redshift database
-
Do one of the following:
-
Create a database from the UI
-
In Postico, in the breadcrumbs on the top, click on the connection name.
-
Click + Database at the bottom and provide a name for the new database. For example, data_analytics in the image below.
-
-
Create using an SQL query:
-
Double-click on the database name.
-
Double-click the SQL Query block to open the query editor.
-
Enter the following SQL query to create the database:
create database <database_name>
For example,create database data_analytics
. -
Click Execute Statement.
-
-
-
Click the new database to access it. Note: You may see some additional, unfamiliar databases. These are internal databases created by Redshift.
-
Click + Table at the bottom of the console to create tables in the new database.
-
Specify the table name and click + Row to add a row and + Column to add columns to the row. For example, in the image below, four columns are created for the table, users, with the
id
column defined as the primary key.The rows and columns subsequently define the schema of the table.
-
Click Save Changes to create the table.
-
Similarly, create all the tables that you require.
5. View the database in your Amazon Redshift cluster
-
Access your Redshift cluster, and click the Databases tab.
-
Click Connect to database.
-
On the Connect to database page, specify the following:
-
Connection: Select Create a new connection.
-
Authentication: Select Temporary credentials if you want to generate your access credentials based on your assigned IAM role. Temporary credentials need to be generated each time you connect to the database. To use a previously saved key (password), select AWS Secrets Manager.
-
Database name: The database you are connecting to. For example, data_analytics.
-
Database user: The user you want to connect as. For example, root.
-
-
Click Connect.
You can access the different tabs of the Database Objects section to view the objects, tables and views within the database:
Retrieve the Hostname and Port Number (Optional)
-
Log in to the Amazon Redshift dashboard.
-
In the left navigation pane, click Clusters.
-
Click the Cluster that you want to connect to Hevo.
-
On <Cluster_name> page, do the following:
-
Click the Copy icon to copy the JDBC URL.
-
Navigate to the Properties tab and copy the Port.
Use this JDBC URL (without the jdbc:redshift:// part) as the database host and the Port as the database port in Hevo while creating your Pipeline.
For example, in the JDBC URL jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev, the database host is examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com.
-
Whitelist Hevo’s IP Addresses
You need to whitelist the Hevo IP address(es) for your region to enable Hevo to connect to your Amazon Redshift database.
To do this:
-
Log in to the Amazon Redshift dashboard.
-
In the left navigation pane, click Clusters.
-
Click the Cluster that you want to connect to Hevo.
-
In the Configuration tab, click the link text under Cluster Properties, VPC security groups to open the Security Groups panel.
-
In the Security Groups panel, click Inbound rules, and then click Edit Inbound rules.
-
In the Edit inbound rules dialog box:
-
Click Add Rule.
-
In the Type column, select Redshift from the drop-down.
-
In the Port Range column, enter the port of your Amazon Redshift cluster. Default port: 5439.
-
In the Source column, select Custom from the drop-down and enter Hevo’s IP addresses for your region. Repeat this step to whitelist all the IP addresses.
-
Click Save.
-
Create a User and Grant Privileges
1. Create a user (optional)
-
Log in to your Amazon Redshift database as a
superuser
or a user withCREATE
privilege. -
Enter the following command:
CREATE USER hevo WITH PASSWORD '<password>';
2. Grant privileges to the user
-
Log in to your Amazon Redshift database as a
superuser
. -
Enter the following commands:
-
Grant
CREATE
privilege to the database user for an existing database:GRANT CREATE ON DATABASE <database_name> TO hevo; GRANT CREATE ON SCHEMA <SCHEMA_NAME> TO <USER>; GRANT USAGE ON SCHEMA <SCHEMA_NAME> TO <USER>;
-
Grant
SELECT
privilege to all tables or specific tables:GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO hevo; #all tables GRANT SELECT ON TABLE <schema_name>.<table_name> TO hevo; #specific table
-
Configure Amazon Redshift as a Destination
Perform the following steps to configure Amazon Redshift as a Destination in Hevo:
-
Click DESTINATIONS in the Navigation Bar.
-
Click + CREATE DESTINATION in the Destinations List View.
-
On the Add Destination page, select Amazon Redshift.
-
On the Configure your Amazon Redshift Destination page, specify the following:
-
Destination Name: A unique name for your Destination, not exceeding 255 characters.
-
General Connection Settings:
-
Paste Connection String:
-
Connection String: The unique identifier for connecting to an Amazon Redshift database. The connection string automatically fetches details such as the database hostname, database port, and database name.
Note: The connection string is obtained from the AWS Console and is the same as your hostname URL.
-
Database User: A user with a non-administrative role in the Redshift database.
-
Database Password: The password of the database user.
-
Schema Name: The name of the Destination database schema. Default value: public.
-
-
Enter Connection Fields Manually:
-
Database Cluster Identifier: Amazon Redshift host’s IP address or DNS name.
Note: For URL-based hostnames, exclude the initial jdbc:redshift:// part. For example, if the hostname URL is jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev, enter examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com.
-
Database Port: The port on which your Amazon Redshift server listens for connections. Default value: 5439.
-
Database User: A user with a non-administrative role in the Redshift database.
-
Database Password: The password of the database user.
-
Database Name: The name of an existing database where the data is to be loaded.
-
Schema Name (Optional): The name of the database schema in Amazon Redshift. Default value: public.
-
-
-
Additional Settings:
-
Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your Amazon Redshift database host to Hevo. This provides an additional level of security to your database by not exposing your Redshift setup to the public. Read Connecting Through SSH.
If this option is disabled, you must whitelist Hevo’s IP addresses.
-
Sanitize Table/Column Names?: Enable this option to remove all non-alphanumeric characters and spaces in a table or column name, and replace them with an underscore (_). Read Name Sanitization.
-
-
Advanced Settings:
- Populate Loaded Timestamp: Enable this option to append the __hevo_loaded_at_ column to the Destination table to indicate the time when the Event was loaded to the Destination. Read Loading Data to a Data Warehouse.
-
-
Click TEST CONNECTION. This button is enabled once all the mandatory fields are specified.
-
Click SAVE & CONTINUE. This button is enabled once all the mandatory fields are specified.
Additional Information
Read the detailed Hevo documentation for the following related topics:
- Handling Source Data with Different Data Types
- Handling Source Data with JSON Fields
- Destination Considerations
- Limitations
Handling Source Data with Different Data Types
For teams created in or after Hevo Release 1.60, Hevo automatically modifies the data type of an Amazon Redshift Destination table column to accommodate Source data with a different data type. Read Handling Different Data Types in Source Data.
Note: Your Hevo release version is mentioned at the bottom of the Navigation Bar.
Handling Source Data with JSON Fields
For Pipelines created in or after Hevo Release 1.74, Hevo uses Replicate JSON fields to JSON columns as the default parsing strategy to load the Source data to the Amazon Redshift Destination.
With the changed strategy, you can query your JSON data directly, eliminating the need to parse it as JSON strings. This change in strategy does not affect the functionality of your existing Pipelines. Therefore, if you want to apply the changed parsing strategy to your existing Pipelines, you need to recreate them.
In addition, the replication strategies, Flatten structs and split arrays to new Events and Replicate JSON fields as JSON strings and array fields as strings have been deprecated for newer Pipelines, and are no longer visible in the UI.
Read Parsing Nested JSON Fields in Events.
Destination Considerations
-
Amazon Redshift is case insensitive to names of database objects, including tables and columns. For example, if your JSON field names are either in mixed or uppercase, such as Product or ITEMS, Amazon Redshift does not recognize these field names and is hence, unable to fetch data from them. Therefore, to enable Amazon Redshift to identify such JSON field names, you must set the session parameter
enable_case_sensitive_identifier
to TRUE. Read SUPER configurations. -
The
SUPER
data type only supports up to 1 MB of data for an individualSUPER
field or object. Read SUPER type - Limitations. -
Hevo stages the ingested data in an Amazon S3 bucket, from where it is loaded to the Destination tables using the COPY command. Hence, if you have enabled enhanced VPC routing, ensure that your VPC is configured correctly. Enhanced VPC routing affects the way your Amazon Redshift cluster accesses other resources in your AWS network, such as the S3 bucket, specifically for the COPY and UNLOAD commands. Read Enhanced VPC Routing in Amazon Redshift.
Limitations
-
Hevo replicates a maximum of 4096 columns to each Amazon Redshift table. Read Limits on the Number of Destination Columns.
-
Hevo does not support writing to tables that have IDENTITY columns.
Let us suppose you create a table with a default IDENTITY column and manually map a Source table to it. When the Pipeline runs, Hevo issues
insert
queries to write all the values from the Source table to this table. However, the writes would fail, as Amazon Redshift does not permit writing values to the IDENTITY column. -
Hevo supports mapping of only JSON fields to the
SUPER
data type that Amazon Redshift uses to support JSON columns.Read SUPER type.
See Also
- Destination FAQs
- Enhanced VPC Routing in Amazon Redshift
- Loading Data to an Amazon Redshift Data Warehouse
- SUPER type
- SUPER configurations
- SUPER type - Limitations
Revision History
Refer to the following table for the list of key updates made to this page:
Date | Release | Description of Change |
---|---|---|
Sep-02-2024 | NA | Updated section, Create a Database (Optional) as per the latest Amazon Redshift UI. |
Sep-04-2023 | NA | Updated the page contents to reflect the latest Amazon Redshift user interface (UI). |
Aug-11-2023 | NA | Fixed broken links. |
Apr-25-2023 | 2.12 | Updated section, Configure Amazon Redshift as a Destination to add information that you must specify all fields to create a Pipeline. |
Feb-20-2023 | 2.08 | Updated section, Configure Amazon Redshift as a Destination to add steps for using the connection string to automatically fetch the database credentials. |
Oct-10-2022 | NA | Added sections: - Set up an Amazon Redshift Instance - Create a Database |
Sep-21-2022 | NA | Added a note in section, Configure Amazon Redshift as a Destination. |
Mar-07-2022 | NA | Updated the section, Destination Considerations for actions to be taken when Enhanced VPC Routing is enabled. |
Feb-07-2022 | 1.81 | Updated section, Whitelist Hevo’s IP Address to remove details about Outbound rules as they are not required. |
Nov-09-2021 | NA | Updated section, Step 2. Create a Database User and Grant Privileges, with the list of commands to be run for granting privileges to the user. |
Oct-25-2021 | 1.74 | Added sections: - Handling Source Data with JSON Fields. - Destination Considerations. Updated sections: - Limitations to add the limitation about Hevo mapping only JSON fields. - See Also. |
Apr-06-2021 | 1.60 | - Added section, Handling Source Data with Different Data Types. |
Feb-22-2021 | NA | - Added the limitation that Hevo does not support writing to tables that have identity columns. - Updated the page overview to state that the Pipeline stages the ingested data in Hevo’s S3 bucket, from where it is finally loaded to the Destination. - Revised the procedural sections to include detailed steps for configuring the Amazon Redshift Destination. |