Amazon S3 (Edge)

Last updated on Jan 27, 2025

Amazon Simple Storage Service (S3) is a durable, efficient, secure, and scalable cloud storage service provided by Amazon Web Services (AWS) that can be accessed from anywhere. S3 uses the concept of buckets to store data in multiple formats, such as images, videos, and documents, organize that data, and retrieve it at any time from the cloud. It also provides you access control, versioning, and integration with other AWS services.

You can configure Amazon S3 as an Edge Destination in your Pipeline to ingest data from your Source and load it using the Append mode into your S3 bucket. The data is stored in your S3 bucket as files compressed using one of the supported compression algorithms.

Note: Currently, Hevo Edge loads into your S3 bucket as CSV files, which are compressed using Gzip.


Configuring Edge Pipeline Settings for S3 Destination

When you create an Edge Pipeline with your S3 Destination, you need to specify a Destination Partition Key.

Destination Partition Key

The default partition key is:

${YEAR}/${MONTH}/${DAY}/${JOB_ID}

The parameters are replaced as follows:

  • ${YEAR}: The year when the data load task ran.

  • ${MONTH}: The month when the data load task ran.

  • ${DAY}: The day when the data load task ran.

  • ${JOB_ID}: The alphanumeric ID of the sync job that ran to ingest and load data.

Hevo Edge provides a few additional time-based parameters that you can specify in the Destination partition key. These parameters are:

  • ${DATE}: The date when the data was loaded to your S3 bucket.

  • ${HOUR}: The hour of the day when the data load task ran.

You must specify one or more of the above parameters to create a folder structure in your S3 bucket. For example, the Destination partition key ${DATE}/${JOB_ID} organizes the data loaded to your S3 bucket based on the date and job ID.

Creating the Directory Path in your S3 Bucket

Hevo organizes your data files in a directory path or folder structure in the S3 bucket configured as your Edge Destination. The directory path for a Pipeline configured with any database Source is created using the following inputs:

  • Path Prefix: The string provided while configuring your S3 Destination. For example,

  • Destination Prefix: The string provided while configuring your Edge Pipeline with the S3 Destination.

  • Database Name: The name of the database specified in the Source configuration.

  • Schema Name: The name of the schema specified in the Source configuration, if applicable.

  • Object Name: The name of the Source object from which data was ingested.

  • Destination Partition Key: The parameters that you provided while configuring your Edge Pipeline with the S3 Destination.

The directory path created is: <path_prefix>/<destination_prefix>_<database_name>_<schema_name>_<object_name>/<destination_partition_key>/

Your data is stored as gzip files in the folder structure created by the directory path.

Example

Suppose you created an Edge Pipeline with the following configuration:

  • Path Prefix: s3-dest

  • Destination Prefix: s3_1

  • Database Name: db1

  • Schema Name: public

  • Object Name: table_1

  • Destination Partition Key: ${YEAR}/${MONTH}/${DAY}/${JOB_ID}

The directory path created based on the above inputs is:

s3-dest/s3_1_db1_public_table_1/year=2024/month=11/day=27/job_id=d12d74f4-d647-4929-a5a6-d329afd916f4/

Your data is stored in the folder structure created by the above directory path.


Tell us what went wrong