Creating File Partitions for S3 Destination through Schema Mapper

Unlike traditional data warehouses like Redshift, Snowflake, etc., S3 destination lacks schema. In fact, all big data systems that rely on S3 as storage asks users to partition their data based on the fields in their data.

Data partitioning helps Big Data systems such as Hive to scan only relevant data when a query is performed. Partitioning of data simply means to create sub-folders for the fields of data. For e.g. partition all users data based on their year and month of joining will create a folder as s3://my-bucket/users/date_joined=2015-03/ or more generically s3://my-bucket/users/date_joined=YYYY-MM/

Hevo allows you to create data partition for all file storage based destination on the schema mapper screen. To get started, just select the Event Type and it will open up a screen with an option to create the data partition.

event type selection

Partition Preview: A partition preview of where your files will be saved.

Prefix: The prefix folder name in S3

Partition Keys: You need to select the event field you want to partition data on. If the field is of type date, time or timestamp you can choose a format for the field accordingly.

Once you are done setting up the partition key click on create mapping and data will be saved to that particular location.

Let’s take an example of your app users. You would have users’ name, date_of_birth, gender, location attributes available and want to write the data in to s3://my-bucket/app_users/date_of_birth=YYYY-MM/location=<value>/ location. We will first set prefix as app_users. Later, in the partition keys, we will select date_of_birth and select YYYY-MM as the format. Then we will click on Add Partition Key and select location as the partition key. Finally, we will click on Create Mapping on the top of the screen to save the selection.

Last updated on 30 Jun 2020