Amazon S3

You can load data from files in an S3 bucket into your Destination database or data warehouse using Hevo Pipelines.

Hevo automatically unzips any Gzipped files on ingestion. Further, files are re-ingested if updated, as it is not possible to identify individual changes.

As of Release 1.66, __hevo_source_modified_at is uploaded to the Destination as a metadata field. For existing Pipelines that have this field:

  • If this field is displayed in the Schema Mapper, you must ignore it and not try to map it to a Destination table column, else the Pipeline displays an error.

  • Hevo automatically loads this information in the __hevo_source_modified_at column, which is already present in the Destination table.

You can, however, continue to use __hevo_source_modified_at to create transformations using the function event.getSourceModifiedAt(). Read Metadata Column __hevo_source_modified_at.

Existing Pipelines that do not have this field are not impacted.


Prerequisites

  • An active AWS account with root user access or an IAM user with the required permissions to be able to obtain the credentials for configuring Amazon S3 as a Source in Hevo.

  • The ListBucket and GetObject permissions are granted by the user in their S3 account for Hevo to access data from their bucket.


Obtaining Amazon S3 Credentials

You must either obtain the access credentials or generate the IAM role based credentials to allow Hevo to connect to your Amazon S3 account and ingest data from it. These methods allow Hevo to establish authentication and replicate your Amazon S3 data into your desired Destination.

Obtain the access credentials

You need the Access Key ID and Secret Access Key from your Amazon S3 account to allow Hevo to access the data from it. A secret key is associated with the access key and is visible only once. Therefore, you must make sure to copy the details or download the key file for later use.

Perform the following steps to obtain your AWS Access Key ID and Secret Access Key:

  1. Log in to the AWS Console.

  2. Click the drop-down next to your profile name in the top right corner of the AWS user interface, and click Security Credentials.

    Security Credentials on console

  3. In the Security Credentials page, expand Access Keys (Access Key ID and Secret Access Key).

  4. Click Create New Access Key.

    Access Key tab

  5. Click Show Access Key to display the generated Access Key ID and Secret Access Key.

    Show Access Key

  6. Copy the key file and save it in a secure location. Alternatively, click Download Key File to download the key file for later use.

    Download Access Key


Generate the IAM role-based credentials

You need to create an IAM policy with the required permissions, a Hevo specific IAM role, and then assign the policy to this IAM role to define what Hevo can access from your S3 account. The Amazon Resource Name(ARN) and the external ID from this role is required to configure the Amazon S3 Source in Hevo.

Perform the following steps to create an IAM policy and a role:

Step 1. Create an IAM policy

The IAM policy is required for Hevo to access the data in your specified Amazon S3 bucket, and ingest and load it to your desired Destination. This policy is then used to create an IAM role for Hevo.

Perform the following steps to create an IAM policy:

  1. Log in to the AWS Console, and select the IAM service.

  2. In the left navigation pane, under Access Management, click Policies.

    AWS nav bar

  3. In the Policies page, click Create policy.

    Create policy

  4. In the Create Policy page, under the Visual editor section, do the following:

    1. Click Choose a service corresponding to the Service drop-down.

      Select service

    2. In the drop-down list, search and select S3.

    3. In the Actions drop-down, under the Access level section, expand the List and Read drop-downs, and select the ListBucket and GetObject check boxes, respectively.

      Permissions

    4. In the Resources drop-down, do the following:

      1. Click Add ARN corresponding to the bucket resource.

        Add ARN bucket

      2. In the Add ARN(s) pop-up window, do the following:

        1. Specify the Bucket name for which you want to grant access.
          Alternatively, select the Any check box to grant access to all buckets in your Amazon S3 account.

        2. Click Add.

          Bucket ARN details

      3. Click Add ARN corresponding to the object resource.

        Add ARN object

      4. In the Add ARN(s) pop-up window, do the following:

        1. Specify the Bucket name for which you want to grant access.
          Alternatively, select the Any check box to grant access to all buckets in your Amazon S3 account.

        2. In the Object name field, specify the path to the objects for which you want to grant access.
          Alternatively, select the Any check box to grant access to all objects in the Amazon S3 bucket.

        3. Click Add.

          Object ARN details

      5. (Optional) In the Request conditions drop-down, select the Source IP check box, and in the IP range field, enter Hevo’s IP address for your region.

        Whitelist IP

  5. At the bottom of the page, click Next:Tags.

  6. At the bottom of the Create policy page, click Next: Review.

  7. In the Review policy page, specify a Name and Description for your policy, and click Create policy.

    Policy Description

You will be redirected to the Policies page, where you can find the policy that you created.

Step 2. Create an IAM role and obtain the IAM role ARN and external ID

After you define the IAM Policy with the required permissions to access your data, you need to create a role for Hevo and assign that policy to it. You must use this role to obtain the ARN and external ID that are required for configuring Amazon S3 as a Source in Hevo. You can see the external ID only once, while creating the role. At that time, you must copy and save it in a secure location for later use.

Perform the following steps to create an IAM role:

  1. Log in to the AWS Console, and select the IAM service.

  2. In the left navigation pane, under Access Management, click Roles.

    Role Nav bar

  3. In the Roles page, click Create role.

    Create role

  4. Under the Trusted entity type section, select AWS account.

    AWS account

  5. Under the An AWS account section, do the following:

    1. Select the Another AWS account option, and specify Hevo’s Account ID (393309748692). This allows you to create a role for Hevo to access and ingest data from your S3 bucket and replicate it to your desired Destination.

      Account ID

    2. Under the Options section, select the Require external ID check box, and specify an External ID of your choice. For example, hevo-role-external-id.

      Note: You must save this external ID in a secure location like any other password. This will be required while setting up a Pipeline in Hevo.

  6. Click Next.

  7. In the Permissions policies section, select the policy that you created in Step 1 above.

    Select Policy

  8. At the bottom of the page, click Next.

  9. Under the Role details section, specify the Role name and Description.

    Role Description

  10. At the bottom of the page, click Create role.

  11. In the Roles page, select the role that you created above.

    Select Role

  12. Under the Summary section of your role, copy the ARN. This will be required while setting up a Pipeline in Hevo.

    Copy ARN


Configuring Amazon S3 as a Source

Perform the following steps to configure S3 as the Source in your Pipeline:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipeline List View.

  3. In the Select Source Type page, select S3.

  4. In the Configure your S3 Source page, specify the following:

    S3 settings

    • Pipeline Name: A unique name for the Pipeline.

    • Source Setup: The credentials needed to allow Hevo to access your data.
      Perform the following steps to connect to your Amazon S3 account:

      1. Do one of the following:

        • Connect using Access Credentials:

          • Access Key ID: The AWS access key ID that you retrieved in obtain the access credentials section above.

          • Secret Access Key: The AWS Secret Access Key for the Access Key ID that you retrieved in obtain the access credentials section above.

          • Bucket Name: The name of the bucket from which you want to ingest data.

          • Bucket Region: The AWS region where the bucket is located.

        • Connect using IAM Role:

          • IAM Role ARN: The Amazon Resource Name (ARN) for your Amazon S3 bucket that you copied in Step 2 above.

          • External ID: The external ID that you specified in Step 2 above.

          • Bucket Name: The name of the bucket from which you want to ingest data.

          • Bucket Region: The AWS region where the bucket is located.

      2. Click TEST & CONTINUE.

    • Data Root: The path for the directory which contains your data. By default, the files are listed from the root of the directory.

      Perform the following steps to select the folder(s) and the data format which you want to ingest using Hevo:

      Select folders to be ingested

      1. Select Folders: The folders which contain the data to be ingested.

      2. Select Type of File: The format of the data file in the Source. Hevo currently supports AVRO, CSV, JSON, and XML formats.

        Note: You can select only one file format at a time. If your Source data is in a different format, you can export the data to either of the supported formats, and then ingest the files.

        Based on the format you select, you must specify some additional settings:

        • CSV:

          • Specify the Field Delimiter. This is the character on which fields in each line are separated. For example, \t, or ,).

          • Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo, then automatically creates the headers during ingestion. Default setting: Enabled. Refer to section, Example.

          • Enable the Create Event Types from folders option if the path prefix has subdirectories containing files in different formats. Hevo reads each subdirectory as a separate Event Type.

        • TSV:

          • Disable the Treat First Row As Column Headers option if the Source data file does not contain column headers. Hevo automatically creates the headers during ingestion. Default setting: Enabled.

          • Enable the Create Event Types from folders option if the path prefix has subdirectories containing files in different formats. Hevo reads each subdirectory as a separate Event Type.

        • JSON: Enable the Create Event Types from folders option if the path prefix has subdirectories containing files in different formats. Hevo reads each of the subdirectories as a separate Event Type.

        • XML: Enable the Create Events from child nodes option to load each node under the root node in the XML file as a separate Event.

      3. Click CONFIGURE SOURCE.

  5. Proceed to configuring the data ingestion and setting up the Destination.


Example: Automatic Column Header Creation for CSV Tables

Consider the following data in CSV format, which has no column headers.

  CLAY COUNTY,32003,11973623
  CLAY COUNTY,32003,46448094
  CLAY COUNTY,32003,55206893
  CLAY COUNTY,32003,15333743
  SUWANNEE COUNTY,32060,85751490
  SUWANNEE COUNTY,32062,50972562
  ST JOHNS COUNTY,846636,32033,
  NASSAU COUNTY,32025,88310177
  NASSAU COUNTY,32041,34865452

If you disable the Treat first row as column headers option, Hevo auto-generates the column headers, as seen in the schema map here:

Column headers generated by Hevo for CSV data

The record in the Destination appears as follows:

Destination record with auto-generated column headers



See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Sep-21-2022 1.98 - Added sections, Obtaining Amazon S3 Credentials and Generate the IAM role based credentials.
- Renamed section, (Optional) Obtain your Access Key ID and Secret Access Key to Obtain the access credentials.
- Updated section, Configuring Amazon S3 as a Source to add information about connecting to Amazon S3 using IAM role.
Sep-07-2022 1.97 Updated section, Configuring Amazon S3 as a Source to reflect the latest UI.
Apr-18-2022 NA Added section, (Optional) Obtain your Access Key ID and Secret Access Key.
Apr-11-2022 1.86 Updated section, Configuring Amazon S3 as a Source to reflect support for TSV file format.
Mar-21-2022 1.85 Removed section, Limitations as Hevo now supports UTF-16 encoding format for CSV files.
Jun-28-2021 1.66 Updated the page overview with information about __hevo_source_modified_at being uploaded as a metadata field from Release 1.66 onwards.
Feb-22-2021 NA Added the limitation about Hevo not supporting UTF-16 encoding format for CSV data.
Last updated on 21 Sep 2022

Tell us what went wrong