Amazon S3 (Edge)

Last updated on Feb 20, 2025

Edge Pipeline is currently available under Early Access. You can request access to evaluate and test its features.

Amazon Simple Storage Service (S3) is a durable, efficient, secure, and scalable cloud storage service provided by Amazon Web Services (AWS) that can be accessed from anywhere. S3 uses the concept of buckets to store data in multiple formats, such as images, videos, and documents, organize that data, and retrieve it at any time from the cloud. It also provides you access control, versioning, and integration with other AWS services.

You can configure Amazon S3 as an Edge Destination in your Pipeline to ingest data from your Source and load it using the Append mode into your S3 bucket. The data is stored in your S3 bucket as files compressed using one of the supported compression algorithms.

Note: Currently, Hevo Edge loads data into your S3 bucket as CSV files, which are compressed using Gzip.


Prerequisites


Step 1
Create an Amazon S3 Bucket (Optional)

Step 2
Create an IAM Policy for the S3 Bucket

To allow Hevo to access your S3 bucket and load data into it, you must create an IAM policy with the following permissions:

Permission Name Allows Hevo to
s3:ListBucket Check if the S3 bucket:
-   Exists.
-   Can be accessed and the objects in the bucket listed.
s3:GetObject Read the objects in the S3 bucket.
s3:PutObject Write objects, such as files, to the S3 bucket.
s3: DeleteObject Delete objects from S3 bucket. Hevo requires this permission to delete the file it creates in your S3 bucket while testing the connection.

Perform the following steps to create the IAM policy:

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access management, click Policies.

    Navigation Pane - Policies

  3. On the Policies page, click Create policy.

    Create Policy-1

  4. On the Specify permissions page, click JSON.

    Select JSON Editor

  5. Paste the following JSON statements in the Policy editor:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:GetObject",
                    "s3:PutObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<your_bucket_name>",
                    "arn:aws:s3:::<your_bucket_name>/*"
                ]
            }
        ]
    }
    

    Note: Replace the placeholder values in the commands above with your own. For example, <your_bucket_name> with s3-docs-20.

    The JSON statements allow Hevo to access the bucket that you specify while configuring S3 as an Edge Destination and load data into it.

  6. At the bottom of the page, click Next.

    Click Next

  7. On the Review and create page, specify the Policy name and then click Create policy at the bottom of the page.

    Review IAM Policy

You must assign this policy to the IAM role or user that you create so that Hevo can access your S3 bucket.


Step 3
Obtain the External ID for your S3 Destinations

Hevo Edge auto-assigns an external ID to the S3 Destination type for your team, which remains unchanged. You need to obtain the value for this ID from the S3 Destination configuration screen.

  1. Log in to your Hevo account and select DESTINATIONS in the Navigation Bar.

  2. Click the Edge tab in the Destinations List View and click + CREATE EDGE DESTINATION.

  3. On the Create Destination page, click S3.

  4. In the Connect to your S3 section of the displayed screen, select Identity and Access Management (IAM) from the Access type drop-down.

  5. Click the copy ( ) icon next to the value in the External ID field and save it securely.

    Obtain External ID

You must add this external ID to the trust policy of the IAM role that you create for Hevo.


Step 4
Obtain the S3 Bucket Connection Settings

Hevo connects to your S3 bucket in one of the following ways:

Connect using the IAM role

To connect using an IAM role, you must generate IAM role-based credentials. For this, you need to add an IAM role for Hevo and assign to it the IAM policy created in Step 2. You also require the Amazon Resource Name (ARN) and the external ID obtained in Step 3 to grant Hevo access to your S3 bucket.

1. Create an IAM role and assign the IAM policy

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access management, click Roles.

    Navigation Pane - Roles

  3. On the Roles page, click Create role.

    CreateRole-1

  4. On the Select trusted entity page, do the following:

    1. In the Trusted entity type section, select Custom trust policy.

      Select Custom Trust Policy

    2. In the Custom trust policy section:

      Custom Trust Policy Editor

      1. Copy the following JSON statements and paste them into the editor window:

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::Hevo's AWS account ID:role/customer-aws_integration"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "sts:ExternalID": "External ID assigned by Hevo"
                        }
                    }
                }
            ]
        }
        
      2. Replace Hevo’s AWS account ID in line 7 with 393309748692 and the External ID assigned by Hevo in line 12 with the value that you obtained in Step 3.

      3. At the bottom of the page, click Next.

  5. On the Add Permissions page, in the Permissions policies section, select the policy you created in Step 2 and click Next at the bottom of the page.

    Create Role - Click Next

  6. On the Name, review, and create page, specify a Role name and a Description and then click Create role at the bottom of the page.

    Create Role

Once the role is created, you are redirected to the Roles page.

2. Obtain the ARN

  1. On the Roles page of your IAM console, search for and click the role that you created above.

  2. On the <Role name> page, in the Summary section, click the copy icon below the ARN field and save it securely like any other password.

    Copy the ARN

You can specify this ARN while configuring S3 as a Destination in Edge.

Connect using access credentials

To connect using access credentials, you need to add an IAM user for Hevo and assign the policy created in Step 3 to it. You require the access key and the secret access key generated for this user to grant Hevo access to your S3 bucket.

Note: The secret key is associated with an access key and is visible only once. Therefore, you must save it or download the key file for later use.

1. Create an IAM user and assign the IAM policy

  1. Log in to the AWS IAM Console.

  2. In the left navigation pane, under Access management, click Users.

    Navigation Pane - Users

  3. On the Users page, click Create user.

    CreateUser-1

  4. On the Specify user details page, specify the User name and click Next.

    Specify Username

  5. On the Set permissions page, in the Permissions options section, click Attach policies directly.

    Attach Policies

  6. In the Search bar of the Permissions policies section, type the name of the policy you created in Step 3.

  7. Select the check box next to the policy and then click Next at the bottom of the page.

    Create User - Click Next

  8. At the bottom of the Review and create page, click Create user.

    Create User

2. Generate the access keys

  1. On the Users page of your IAM console, click the user that you created above.

    Search User

  2. On the <User name> page, in the Summary section, click Create access key.

    Click Create Access Key

  3. On the Access key best practices & alternatives page, select Command Line Interface (CLI).

    Select Command Line Interface

  4. At the bottom of the page, select the I understand the above… check box and click Next.

    Click Acknowledment

  5. (Optional) Specify a description tag for the access key to help you identify it.

  6. Click Create access key.

    Create the Access Key

  7. On the Retrieve access keys page, in the Access key section, click the copy icon in the Access key and Secret access key fields and save the keys securely like any other password. Optionally, click Download .csv file to save the keys on your local machine.

    Retrieve the Access Keys

  8. Click Done.

You can use these access keys when configuring S3 as a Destination in Edge.


Step 5
Configure Amazon S3 as a Destination in Edge

Perform the following steps to configure Amazon S3 as a Destination in Edge:

  1. Select DESTINATIONS in the Navigation Bar.

  2. Click the Edge tab in the Destinations List View and click + CREATE EDGE DESTINATION.

  3. On the Create Destination page, click S3.

  4. In the screen that appears, specify the following:

    Configure S3 Destination

    • Destination Name: A unique name for your Destination, not exceeding 255 characters.

    • In the Connect to your S3 section:

      • From the Access type drop-down, select one of the following connection methods:

        • Identity and Access Management (IAM): Connect to your S3 bucket using the IAM role that you created for Hevo.

          • IAM Role ARN: The globally unique identifier assigned by AWS to the IAM role you created for Hevo. For example, arn:aws:iam::393309748692:role/Role-for-Hevo-Edge.

          • External ID: The unique identifier auto-assigned by Hevo for the S3 Destination type. You must add the displayed value to the trust policy of your IAM role.

            Note: This field is non-editable.

          • Bucket Name: The name of the bucket where data is to be loaded. For example, s3-docs-20.

          • Path Prefix: A string added at the beginning of the directory path to help you organize your data files in the S3 bucket. Refer to Configuring Edge Pipeline Settings for S3 Destination for information on the directory path.

          • File Format: The format in which you want to store your data files. Currently, Edge supports only the CSV format.

          • Region: The AWS region where your S3 bucket is located. For example, Asia Pacific (Singapore).

        • Key Based: Connect using the access credentials of the IAM user created for Hevo.

          Key-based Authentication

          • Access Key ID: The publicly shareable unique identifier associated with the access key pair created for your IAM user in the section above. For example, AKIAIOSFODNN7EAAMMBB.

          • Secret Access Key: The cryptographic key associated with the access key ID generated for your IAM user in the section above.

          • Bucket Name: The name of the bucket where data is to be loaded. For example, s3-docs-20.

          • Path Prefix: A string added at the beginning of the directory path to help you organize your data files in the S3 bucket. Refer to Configuring Edge Pipeline Settings for S3 Destination for information on the directory path.

          • File Format: The format in which you want to store your data files. Currently, Edge supports only the CSV format.

          • Region: The AWS region where your S3 bucket is located. For example, Asia Pacific (Singapore).

  5. Click TEST & SAVE to test the connection to your S3 bucket.

Once the test is successful, Hevo creates your S3 Edge Destination. You can use this Destination while creating your Edge Pipeline.


Configuring Edge Pipeline Settings for S3 Destination

When you create an Edge Pipeline with your S3 Destination, you need to specify a Destination Partition Key.

Destination Partition Key

The default partition key is:

${YEAR}/${MONTH}/${DAY}/${JOB_ID}

The parameters are replaced as follows:

  • ${YEAR}: The year when the data load task ran.

  • ${MONTH}: The month when the data load task ran.

  • ${DAY}: The day when the data load task ran.

  • ${JOB_ID}: The alphanumeric ID of the sync job that ran to ingest and load data.

Hevo Edge provides a few additional time-based parameters that you can specify in the Destination partition key. These parameters are:

  • ${DATE}: The date when the data was loaded to your S3 bucket.

  • ${HOUR}: The hour of the day when the data load task ran.

You must specify one or more of the above parameters to create a folder structure in your S3 bucket. For example, the Destination partition key ${DATE}/${JOB_ID} organizes the data loaded to your S3 bucket based on the date and job ID.

Creating the Directory Path in your S3 Bucket

Hevo organizes your data files in a directory path or folder structure in the S3 bucket configured as your Edge Destination. The directory path for a Pipeline configured with any database Source is created using the following inputs:

  • Path Prefix: The string provided while configuring your S3 Destination. For example,

  • Destination Prefix: The string provided while configuring your Edge Pipeline with the S3 Destination.

  • Database Name: The name of the database specified in the Source configuration.

  • Schema Name: The name of the schema specified in the Source configuration, if applicable.

  • Object Name: The name of the Source object from which data was ingested.

  • Destination Partition Key: The parameters that you provided while configuring your Edge Pipeline with the S3 Destination.

The directory path created is: <path_prefix>/<destination_prefix>_<database_name>_<schema_name>_<object_name>/<destination_partition_key>/

Your data is stored as gzip files in the folder structure created by the directory path.

Example

Suppose you created an Edge Pipeline with the following configuration:

  • Path Prefix: s3-dest

  • Destination Prefix: s3_1

  • Database Name: db1

  • Schema Name: public

  • Object Name: table_1

  • Destination Partition Key: ${YEAR}/${MONTH}/${DAY}/${JOB_ID}

The directory path created based on the above inputs is:

s3-dest/s3_1_db1_public_table_1/year=2024/month=11/day=27/job_id=d12d74f4-d647-4929-a5a6-d329afd916f4/

Your data is stored in the folder structure created by the above directory path.


Tell us what went wrong