- Introduction
- Getting Started
- Creating an Account in Hevo
- Subscribing to Hevo via AWS Marketplace
-
Connection Options
- Connecting Through SSH
- Connecting Through Reverse SSH Tunnel
- Connecting Through VPN
- Connecting Through Mongo PrivateLink
- Connecting Through AWS Transit Gateway
- Connecting Through AWS VPC Endpoint
- Connecting Through AWS VPC Peering
- Using Google Account Authentication
- How Hevo Authenticates Sources and Destinations using OAuth
- Reauthorizing an OAuth Account
- Familiarizing with the UI
- Creating your First Pipeline
- Data Loss Prevention and Recovery
- Data Ingestion
- Data Loading
- Loading Data in a Database Destination
- Loading Data to a Data Warehouse
- Optimizing Data Loading for a Destination Warehouse
- Deduplicating Data in a Data Warehouse Destination
- Manually Triggering the Loading of Events
- Scheduling Data Load for a Destination
- Loading Events in Batches
- Data Loading Statuses
- Data Spike Alerts
- Name Sanitization
- Table and Column Name Compression
- Parsing Nested JSON Fields in Events
- Pipelines
- Data Flow in a Pipeline
- Familiarizing with the Pipelines UI
- Working with Pipelines
- Managing Objects in Pipelines
- Pipeline Jobs
-
Transformations
-
Python Code-Based Transformations
- Supported Python Modules and Functions
-
Transformation Methods in the Event Class
- Create an Event
- Retrieve the Event Name
- Rename an Event
- Retrieve the Properties of an Event
- Modify the Properties for an Event
- Fetch the Primary Keys of an Event
- Modify the Primary Keys of an Event
- Fetch the Data Type of a Field
- Check if the Field is a String
- Check if the Field is a Number
- Check if the Field is Boolean
- Check if the Field is a Date
- Check if the Field is a Time Value
- Check if the Field is a Timestamp
-
TimeUtils
- Convert Date String to Required Format
- Convert Date to Required Format
- Convert Datetime String to Required Format
- Convert Epoch Time to a Date
- Convert Epoch Time to a Datetime
- Convert Epoch to Required Format
- Convert Epoch to a Time
- Get Time Difference
- Parse Date String to Date
- Parse Date String to Datetime Format
- Parse Date String to Time
- Utils
- Examples of Python Code-based Transformations
-
Drag and Drop Transformations
- Special Keywords
-
Transformation Blocks and Properties
- Add a Field
- Change Datetime Field Values
- Change Field Values
- Drop Events
- Drop Fields
- Find & Replace
- Flatten JSON
- Format Date to String
- Format Number to String
- Hash Fields
- If-Else
- Mask Fields
- Modify Text Casing
- Parse Date from String
- Parse JSON from String
- Parse Number from String
- Rename Events
- Rename Fields
- Round-off Decimal Fields
- Split Fields
- Examples of Drag and Drop Transformations
- Effect of Transformations on the Destination Table Structure
- Transformation Reference
- Transformation FAQs
-
Python Code-Based Transformations
-
Schema Mapper
- Using Schema Mapper
- Mapping Statuses
- Auto Mapping Event Types
- Manually Mapping Event Types
- Modifying Schema Mapping for Event Types
- Schema Mapper Actions
- Fixing Unmapped Fields
- Resolving Incompatible Schema Mappings
- Resizing String Columns in the Destination
- Schema Mapper Compatibility Table
- Limits on the Number of Destination Columns
- File Log
- Troubleshooting Failed Events in a Pipeline
- Mismatch in Events Count in Source and Destination
- Audit Tables
- Activity Log
-
Pipeline FAQs
- Can multiple Sources connect to one Destination?
- What happens if I re-create a deleted Pipeline?
- Why is there a delay in my Pipeline?
- Can I change the Destination post-Pipeline creation?
- Why is my billable Events high with Delta Timestamp mode?
- Can I drop multiple Destination tables in a Pipeline at once?
- How does Run Now affect scheduled ingestion frequency?
- Will pausing some objects increase the ingestion speed?
- Can I see the historical load progress?
- Why is my Historical Load Progress still at 0%?
- Why is historical data not getting ingested?
- How do I set a field as a primary key?
- How do I ensure that records are loaded only once?
- Events Usage
- Sources
- Free Sources
-
Databases and File Systems
- Data Warehouses
-
Databases
- Connecting to a Local Database
- Amazon DocumentDB
- Amazon DynamoDB
- Elasticsearch
-
MongoDB
- Generic MongoDB
- MongoDB Atlas
- Support for Multiple Data Types for the _id Field
- Example - Merge Collections Feature
-
Troubleshooting MongoDB
-
Errors During Pipeline Creation
- Error 1001 - Incorrect credentials
- Error 1005 - Connection timeout
- Error 1006 - Invalid database hostname
- Error 1007 - SSH connection failed
- Error 1008 - Database unreachable
- Error 1011 - Insufficient access
- Error 1028 - Primary/Master host needed for OpLog
- Error 1029 - Version not supported for Change Streams
- SSL 1009 - SSL Connection Failure
- Troubleshooting MongoDB Change Streams Connection
- Troubleshooting MongoDB OpLog Connection
-
Errors During Pipeline Creation
- SQL Server
-
MySQL
- Amazon Aurora MySQL
- Amazon RDS MySQL
- Azure MySQL
- Generic MySQL
- Google Cloud MySQL
- MariaDB MySQL
-
Troubleshooting MySQL
-
Errors During Pipeline Creation
- Error 1003 - Connection to host failed
- Error 1006 - Connection to host failed
- Error 1007 - SSH connection failed
- Error 1011 - Access denied
- Error 1012 - Replication access denied
- Error 1017 - Connection to host failed
- Error 1026 - Failed to connect to database
- Error 1027 - Unsupported BinLog format
- Failed to determine binlog filename/position
- Schema 'xyz' is not tracked via bin logs
- Errors Post-Pipeline Creation
-
Errors During Pipeline Creation
- MySQL FAQs
- Oracle
-
PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon RDS PostgreSQL
- Azure PostgreSQL
- Generic PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
-
Troubleshooting PostgreSQL
-
Errors during Pipeline creation
- Error 1003 - Authentication failure
- Error 1006 - Connection settings errors
- Error 1011 - Access role issue for logical replication
- Error 1012 - Access role issue for logical replication
- Error 1014 - Database does not exist
- Error 1017 - Connection settings errors
- Error 1023 - No pg_hba.conf entry
- Error 1024 - Number of requested standby connections
- Errors Post-Pipeline Creation
-
Errors during Pipeline creation
- PostgreSQL FAQs
- Troubleshooting Database Sources
- File Storage
- Engineering Analytics
- Finance & Accounting Analytics
-
Marketing Analytics
- ActiveCampaign
- AdRoll
- Amazon Ads
- Apple Search Ads
- AppsFlyer
- CleverTap
- Criteo
- Drip
- Facebook Ads
- Facebook Page Insights
- Firebase Analytics
- Freshsales
- Google Ads
- Google Analytics
- Google Analytics 4
- Google Analytics 360
- Google Play Console
- Google Search Console
- HubSpot
- Instagram Business
- Klaviyo v2
- Lemlist
- LinkedIn Ads
- Mailchimp
- Mailshake
- Marketo
- Microsoft Ads
- Onfleet
- Outbrain
- Pardot
- Pinterest Ads
- Pipedrive
- Recharge
- Segment
- SendGrid Webhook
- SendGrid
- Salesforce Marketing Cloud
- Snapchat Ads
- SurveyMonkey
- Taboola
- TikTok Ads
- Twitter Ads
- Typeform
- YouTube Analytics
- Product Analytics
- Sales & Support Analytics
- Source FAQs
- Destinations
- Familiarizing with the Destinations UI
- Cloud Storage-Based
- Databases
-
Data Warehouses
- Amazon Redshift
- Amazon Redshift Serverless
- Azure Synapse Analytics
- Databricks
- Firebolt
- Google BigQuery
- Hevo Managed Google BigQuery
- Snowflake
-
Destination FAQs
- Can I change the primary key in my Destination table?
- How do I change the data type of table columns?
- Can I change the Destination table name after creating the Pipeline?
- How can I change or delete the Destination table prefix?
- Why does my Destination have deleted Source records?
- How do I filter deleted Events from the Destination?
- Does a data load regenerate deleted Hevo metadata columns?
- How do I filter out specific fields before loading data?
- Transform
- Alerts
- Account Management
- Activate
- Glossary
Releases- Release 2.30.2 (Nov 25-Dec 02, 2024)
- Release 2.30.1 (Nov 18-25, 2024)
- Release 2.30 (Oct 21-Nov 18, 2024)
-
2024 Releases
- Release 2.29 (Sep 30-Oct 22, 2024)
- Release 2.28 (Sep 02-30, 2024)
- Release 2.27 (Aug 05-Sep 02, 2024)
- Release 2.26 (Jul 08-Aug 05, 2024)
- Release 2.25 (Jun 10-Jul 08, 2024)
- Release 2.24 (May 06-Jun 10, 2024)
- Release 2.23 (Apr 08-May 06, 2024)
- Release 2.22 (Mar 11-Apr 08, 2024)
- Release 2.21 (Feb 12-Mar 11, 2024)
- Release 2.20 (Jan 15-Feb 12, 2024)
-
2023 Releases
- Release 2.19 (Dec 04, 2023-Jan 15, 2024)
- Release Version 2.18
- Release Version 2.17
- Release Version 2.16 (with breaking changes)
- Release Version 2.15 (with breaking changes)
- Release Version 2.14
- Release Version 2.13
- Release Version 2.12
- Release Version 2.11
- Release Version 2.10
- Release Version 2.09
- Release Version 2.08
- Release Version 2.07
- Release Version 2.06
-
2022 Releases
- Release Version 2.05
- Release Version 2.04
- Release Version 2.03
- Release Version 2.02
- Release Version 2.01
- Release Version 2.00
- Release Version 1.99
- Release Version 1.98
- Release Version 1.97
- Release Version 1.96
- Release Version 1.95
- Release Version 1.93 & 1.94
- Release Version 1.92
- Release Version 1.91
- Release Version 1.90
- Release Version 1.89
- Release Version 1.88
- Release Version 1.87
- Release Version 1.86
- Release Version 1.84 & 1.85
- Release Version 1.83
- Release Version 1.82
- Release Version 1.81
- Release Version 1.80 (Jan-24-2022)
- Release Version 1.79 (Jan-03-2022)
-
2021 Releases
- Release Version 1.78 (Dec-20-2021)
- Release Version 1.77 (Dec-06-2021)
- Release Version 1.76 (Nov-22-2021)
- Release Version 1.75 (Nov-09-2021)
- Release Version 1.74 (Oct-25-2021)
- Release Version 1.73 (Oct-04-2021)
- Release Version 1.72 (Sep-20-2021)
- Release Version 1.71 (Sep-09-2021)
- Release Version 1.70 (Aug-23-2021)
- Release Version 1.69 (Aug-09-2021)
- Release Version 1.68 (Jul-26-2021)
- Release Version 1.67 (Jul-12-2021)
- Release Version 1.66 (Jun-28-2021)
- Release Version 1.65 (Jun-14-2021)
- Release Version 1.64 (Jun-01-2021)
- Release Version 1.63 (May-19-2021)
- Release Version 1.62 (May-05-2021)
- Release Version 1.61 (Apr-20-2021)
- Release Version 1.60 (Apr-06-2021)
- Release Version 1.59 (Mar-23-2021)
- Release Version 1.58 (Mar-09-2021)
- Release Version 1.57 (Feb-22-2021)
- Release Version 1.56 (Feb-09-2021)
- Release Version 1.55 (Jan-25-2021)
- Release Version 1.54 (Jan-12-2021)
-
2020 Releases
- Release Version 1.53 (Dec-22-2020)
- Release Version 1.52 (Dec-03-2020)
- Release Version 1.51 (Nov-10-2020)
- Release Version 1.50 (Oct-19-2020)
- Release Version 1.49 (Sep-28-2020)
- Release Version 1.48 (Sep-01-2020)
- Release Version 1.47 (Aug-06-2020)
- Release Version 1.46 (Jul-21-2020)
- Release Version 1.45 (Jul-02-2020)
- Release Version 1.44 (Jun-11-2020)
- Release Version 1.43 (May-15-2020)
- Release Version 1.42 (Apr-30-2020)
- Release Version 1.41 (Apr-2020)
- Release Version 1.40 (Mar-2020)
- Release Version 1.39 (Feb-2020)
- Release Version 1.38 (Jan-2020)
- Early Access New
- Upcoming Features
Amazon S3
This Destination is currently available for Early Access. Please contact your Hevo account executive or the Support team to enable it for your team. Alternatively, request for early access to try out one or more such features.
Amazon Simple Storage Service (S3) is a durable, efficient, secure, and scalable cloud storage service provided by Amazon Web Services (AWS) that can be accessed from anywhere. S3 uses the concept of buckets to store data in multiple formats, such as images, videos, and documents, organize that data, and retrieve it at any time from the cloud. It also provides you access control, versioning, and integration with other AWS services.
Hevo can ingest data from any of your Pipelines and load it in near real-time to your S3 bucket using the Append Rows on Update mode. The ingested data is loaded as Parquet or JSONL files to the S3 buckets.
Note: As the data is stored in file format in the S3 bucket, you cannot view the Destination schema through the Schema Mapper or query the loaded data using the Workbench.
Hevo allows storing data in a compressed or uncompressed form in the S3 bucket. Refer to the table below for the supported compression algorithms:
File Format | Compression Support |
---|---|
Parquet | - Uncompressed - Snappy |
JSONL | - Uncompressed - Gzip |
If you are new to AWS or do not have an AWS account, follow the steps listed in the Create an AWS account section, and after that, Set up an Amazon S3 bucket. You can then configure the S3 bucket as a Destination in Hevo.
The following image illustrates the key steps required for configuring Amazon S3 as a Destination in Hevo:
Prerequisites
-
You have an active AWS account and an IAM user in the account with permission to:
-
Create an IAM role (to generate the IAM role-based credentials) or create an IAM user (to generate the access credentials).
-
An Amazon S3 bucket in one of the supported AWS regions is available. Refer to the Create an Amazon S3 bucket section for the steps if you do not have one.
-
The IAM role-based credentials or access credentials are available to enable Hevo to connect to your S3 bucket.
-
You are assigned the Team Collaborator or any administrator role except the Billing Administrator role in Hevo to create the Destination.
Create an Amazon S3 Bucket (Optional)
Note: The following steps must be performed by a Root user or a user with administrative access. In AWS, permissions and roles are managed through the IAM page.
-
Log in to your AWS account and open the Amazon S3 console. Read Log in to your AWS instance for the steps to create an AWS account if you do not have one.
-
At the top right corner of the page, click your current AWS region, and from the drop-down list, select the region in which you want to create your S3 bucket. For example, Singapore. It is recommended that you co-locate your S3 bucket in the same region as your Hevo account for faster access.
-
In the left navigation pane of your Amazon S3 dashboard, click Buckets.
-
In the right pane, under the General purpose buckets tab, click Create bucket.
-
On the Create bucket page, General configuration section, do the following:
-
Ensure that the AWS Region in which you want to create your S3 bucket is the same as the one selected in Step 2.
Note: This field is non-editable.
-
Specify the following:
-
Bucket name: A unique name for your S3 bucket, not less than 3 characters and not exceeding 63 characters. Read Bucket naming rules for the conventions to follow while naming a bucket.
-
Copy settings from existing bucket - optional: Click Choose bucket to select an existing bucket and copy its settings to your bucket.
-
-
-
In the Object Ownership section, specify who can access the objects in your S3 bucket. Select one of the following:
-
ACLs disabled (recommended): All the objects in the S3 bucket are owned by the AWS account that created it.
- Bucket owner enforced (Default): As the bucket owner, you have full control over the objects created in the S3 bucket. You can grant other users access to the bucket and its objects through IAM user policies and S3 bucket policies.
-
ACLs enabled: The objects in the S3 bucket can be owned by other AWS accounts, and ownership is controlled through access control lists (ACLs). Based on how you want to enforce ownership, select one of the following:
-
Bucket owner preferred: As the bucket owner, you have full control over new objects uploaded to the bucket with the bucket-owner-full-control canned ACL specified. The object writer, or the AWS account, remains the owner of new objects uploaded without this ACL. The ownership of existing objects is not affected by this setting. Read Access control list for information on the ACLs supported by Amazon S3.
-
Object writer: The AWS account that uploads objects to the bucket remains the owner of those objects. With this option, as the bucket owner, you cannot grant access through bucket policies to the objects owned by other AWS accounts.
-
-
-
In the Block Public Access settings for this bucket section, do one of the following:
-
Select the Block all public access check box if you do not want the bucket and its objects to be publicly accessible. Default selection: Enabled.
-
Deselect the Block all public access check box to grant public access to the bucket and the objects within, or selectively block access to them. Read Blocking public access to your Amazon S3 storage to understand the individual options.
Note: If you turn off this setting, you must acknowledge the warning by selecting the I acknowledge that… check box.
-
-
(Optional) In the Bucket Versioning section, based on your requirements, specify one of the following:
-
Disable: Your bucket does not maintain multiple versions of an object, or is unversioned. This is the default selection.
-
Enable: Your bucket stores every version of an object, allowing you to recover objects in case they are accidentally deleted or overwritten.
Note: Once you enable versioning on a bucket, you cannot revert it. Versioning can only be suspended. Read Using versioning in S3 buckets to understand the feature.
-
-
(Optional) In the Tags section, specify a key-value pair to categorize the data stored in your bucket by its purpose. For example, to consolidate all your billing data in the bucket, specify the key as Billing and its value as True.
-
In the Default encryption section, specify the following:
-
Encryption type: The type of encoding you want Amazon S3 to apply on objects before storing them in the bucket. Server-side encryption is automatically applied to protect your stored data. Select from one of the following types:
-
Server-side encryption with Amazon S3 managed keys (SSE-S3) (Default): In this option, Amazon S3 manages the encryption and decryption process.
-
Server-side encryption with AWS Key Management Service keys (SSE-KMS): In this option, encryption is managed by AWS KMS. You can specify the default AWS managed key (aws/s3), select from one of the existing KMS keys, or create one at this time. Read Creating keys for the steps to add a new AWS KMS key.
-
Dual-layer server-side encryption with AWS Key Management Service keys (DSSE-KMS): In this option, two layers of encryption are applied to the objects by AWS KMS, which manages the encryption.
Note: At this time, Hevo supports only SSE-S3.
-
-
Bucket Key: A data key, with a short lifespan, generated by AWS from AWS KMS and kept in S3. Using a bucket key helps lower the encryption costs for SSE-KMS by reducing the traffic between S3 and AWS KMS. A bucket key is not required for SSE-S3 and is not supported by DSSE-KMS. For these encryption types, you must Disable bucket keys. Default selection: Enable.
-
-
(Optional) In the Advanced settings, Object Lock section, specify one of the following:
-
Disable (Default): Objects uploaded to the bucket are not locked and can be deleted or overwritten.
-
Enable: Objects uploaded to the bucket are stored using the write-once-read-many (WORM) model, which prevents the objects from being deleted or overwritten. You must acknowledge the warning to enable object lock for your bucket.
Note: Object lock works only in versioned buckets. Thus, selecting this option automatically enables bucket versioning. Read Using S3 Object Lock to understand this feature.
-
-
Click Create bucket to create your Amazon S3 bucket. You can specify this bucket while configuring Amazon S3 as a Destination in Hevo.
Create an IAM Policy for the S3 Bucket
To allow Hevo to access your S3 bucket and load data into it, you must create an IAM policy with the following permissions:
Permission Name | Allows Hevo to |
---|---|
s3:ListBucket | Check if the S3 bucket: - Exists. - Can be accessed and the objects in the bucket listed. |
s3:GetObject | Read the objects in the S3 bucket. |
s3:PutObject | Write objects, such as files, to the S3 bucket. |
s3: DeleteObject | Delete objects from S3 bucket. Hevo requires this permission to delete the file it creates in your S3 bucket while testing the connection. |
Perform the following steps to create the IAM policy:
-
Log in to the AWS IAM Console.
-
In the left navigation pane, under Access management, click Policies.
-
On the Policies page, click Create policy.
-
On the Specify permissions page, click JSON.
-
In the Policy editor section, paste the following JSON statements:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::<your_bucket_name>", "arn:aws:s3:::<your_bucket_name>/*" ] } ] }
Note: Replace the placeholder values in the commands above with your own. For example, <your_bucket_name> with s3-destination1.
The JSON statements allow Hevo to access the bucket that you specify while configuring S3 as a Destination and load data into it.
-
At the bottom of the page, click Next.
-
On the Review and create page, specify the Policy name, and at the bottom of the page, click Create policy.
You must assign this policy to the IAM role or the IAM user that you create for Hevo to access your S3 bucket.
Obtain the Amazon S3 Connection Settings
Hevo connects to your Amazon S3 bucket in one of the following ways:
Connection method: Using IAM role
To connect using an IAM role, you need to generate IAM role-based credentials. For this, you need to add an IAM role for Hevo and assign the IAM policy created in Step 2 above to it. You require the Amazon Resource Name (ARN) and external ID from this role to grant Hevo access to your S3 bucket.
1. Create an IAM role and assign the IAM policy
-
Log in to the AWS IAM Console.
-
In the left navigation pane, under Access management, click Roles.
-
On the Roles page, click Create role.
-
In the Select trusted entity section, select AWS account.
-
In the An AWS account section, select Another AWS account, and in the Account ID field, specify Hevo’s Account ID, 393309748692.
-
In the Options section, select the Require external ID check box, specify an External ID of your choice, and click Next.
-
On the Add Permissions page, search and select the policy that you created in Step 2 above, and at the bottom of the page, click Next.
-
On the Name, review, and create page, specify a Role name and a Description, and at the bottom of the page, click Create role.
You are redirected to the Roles page.
2. Obtain the ARN and external ID
-
On the Roles page of your IAM console, click the role that you created above.
-
On the <Role name> page, Summary section, click the copy icon below the ARN field and save it securely like any other password.
-
In the Trust relationships tab, copy the external ID corresponding to the sts:ExternalID field. For example, hevo-s3-dest-external-id in the image below.
You can use the ARN and the external ID while configuring S3 as a Destination in Hevo.
Connection method: Using access credentials
To connect using access credentials, you need to add an IAM user for Hevo and assign the policy created in Step 2 above to it. You require the access key and the secret access key generated for this user to grant Hevo access to your S3 bucket.
Note: The secret key is associated with an access key and is visible only once. Therefore, you must save it or download the key file for later use.
1. Create an IAM user and assign the IAM policy
-
Log in to the AWS IAM Console.
-
In the left navigation pane, under Access management, click Users.
-
On the Users page, click Create user.
-
On the Specify user details page, specify the User name, and click Next.
-
On the Set permissions page, Permissions options section, click Attach policies directly.
-
In the Search bar of the Permissions policies section, type the name of the policy you created in Step 2 above.
-
Select the check box next to the policy to associate it with the user, and at the bottom of the page, click Next.
-
At the bottom of the Review and create page, click Create user.
2. Generate the access keys
-
On the Users page of your IAM console, click the user that you created above.
-
On the <User name> page, click the Security credentials tab.
-
In the Access keys section, click Create access key.
-
On the Access key best practices & alternatives page, select Command Line Interface (CLI).
-
At the bottom of the page, select the I understand the above… check box and click Next.
-
(Optional) Specify a description tag for the access key, to help you identify it.
-
Click Create access key.
-
On the Retrieve access keys page, Access key section, click the copy icon in the Access key and Secret access key fields and save the keys securely like any other password. Optionally, click Download .csv file to save the keys on your local machine.
Note: Once you leave this page, you cannot view the secret access key again.
You can use these access keys while configuring S3 as a Destination in Hevo.
Configure Amazon S3 as a Destination
Perform the following steps to configure Amazon S3 as a Destination in Hevo:
-
Click DESTINATIONS in the Navigation Bar.
-
Click + CREATE DESTINATION in the Destinations List View.
-
On the Add Destination page, select S3.
-
On the Configure your S3 Destination page, specify the following:
-
Destination Name: A unique name for your Destination, not exceeding 255 characters.
-
Select one of the following methods to connect to your Amazon S3 bucket:
-
Connect Using an IAM Role: Connect using the role that you added in the Create an IAM role and assign the IAM policy section above.
-
IAM Role ARN: The globally unique identifier assigned by AWS to the IAM role you created in the section above. For example, arn:aws:iam::102345678922:role/MyRole.
-
External ID: The unique identifier you assigned to the IAM role when you created it in the section above. For example, hevo-s3-dest-external-id.
-
Bucket Name: The name of the bucket where data is to be loaded. For example, s3-destination1.
-
Bucket Region: The AWS region where the S3 bucket is located. For example, Asia Pacific (Singapore).
-
Prefix (Optional): A string added at the beginning of the directory path, to help you organize your data files in the S3 bucket. Refer to Configuring the Pipeline Settings for information on the directory path.
-
-
Connect Using Access Credentials: Connect using the IAM user that you added in the Create an IAM user and assign the IAM policy section above.
-
Access Key ID: The publicly shareable unique identifier associated with the access key pair created for your IAM user in the section above. For example, AKIAIOSFODNN7EAAMMBB.
-
Secret Access Key: The cryptographic key associated with the access key ID generated for your IAM user in the section above. For example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYAABBCCDDEE.
-
Bucket Name: The name of the bucket where data is to be loaded. For example, s3-destination1.
-
Bucket Region: The AWS region where the S3 bucket is located. For example, Asia Pacific (Singapore).
-
Prefix (Optional): A string added at the beginning of the directory path, to help you organize your data files in the S3 bucket. Refer to Configuring the Pipeline Settings for information on the directory path.
-
-
-
Select the File Format: The format in which you want to store your data files. Select one of the following:
-
Parquet: In this format, Hevo writes the ingested data to files in a columnar manner, that is data is stored column-wise. Read Apache Parquet to understand this format. Specify the following:
- Compression Support: The algorithm to apply for compressing data before storing it in the S3 bucket. Hevo supports storing data either in an uncompressed form or a compressed form using the SNAPPY algorithm. Default value: UNCOMPRESSED.
-
JSONL: In this format, Hevo writes the ingested data to files as JSON objects, one per line. Read JSON Lines to understand this format. Specify the following:
- Compression Support: The algorithm to apply for compressing data before storing it in the S3 bucket. Hevo supports storing data either in an uncompressed form or a compressed form using GZIP. Default value: UNCOMPRESSED.
-
-
-
Click TEST CONNECTION. This button is enabled once all the mandatory fields are specified.
-
Click SAVE & CONTINUE. This button is enabled once all the mandatory fields are specified.
Configuring the Pipeline Settings
When you create a Pipeline with an S3 Destination, you must specify the directory path, or folder structure. Hevo loads the data files into your S3 bucket at the specified location.
This is the default directory path:
${PIPELINE_NAME}/${OBJECT_NAME}/${DATE}/${JOB_ID}
Hevo creates the data files in this path by replacing these parameters as follows:
-
${PIPELINE_NAME}: The name of your Pipeline that uses the configured S3 bucket as a Destination.
-
${OBJECT_NAME}: The name of the Source object from which data was ingested.
-
${DATE}: The date when the data was loaded to your S3 bucket.
-
${JOB_ID}: The ID of the job in which the data ingestion task ran.
If you specify a prefix while configuring your S3 Destination, it is appended at the beginning of the directory path and your data files are created in that location.
Note: ${PIPELINE_NAME} and ${OBJECT_NAME} are mandatory parameters and your directory path must contain these two.
You can also specify a directory path to organize your data files into folders created using time-based parameters. For this, append one or more of the following parameters after ${PIPELINE_NAME}/${OBJECT_NAME}:
-
${YEAR}: The year when the data load task ran.
-
${MONTH}: The month when the data load task ran.
-
${DAY}: The day when the data load task ran.
-
${HOUR}: The hour of the day when the data load task ran.
For example, if you want to organize your Source data in the S3 bucket based on the day and hour, you should specify the path as ${PIPELINE_NAME}/${OBJECT_NAME}/${DAY}/${HOUR}.
Limitations
-
Your S3 bucket must be created in one of the AWS regions supported by Hevo.
-
At this time, Hevo supports loading data only in the Append Rows on Update mode.
Revision History
Refer to the following table for the list of key updates made to this page:
Date | Release | Description of Change |
---|---|---|
Dec-02-2024 | NA | Updated sections, Create an Amazon S3 Bucket (Optional), Create an IAM Policy for the S3 Bucket, and Obtain the Amazon S3 Connection Settings as per the latest Amazon S3 UI. |
Jul-08-2024 | NA | Updated section Configuring the Pipeline Settings to revise the definitions of the directory path parameters. |
Feb-05-2024 | 2.20 | New document. |