Creating File Partitions for S3 Destination through Schema Mapper

Unlike traditional data warehouses like Redshift and Snowflake, the S3 Destination lacks schema. In fact, all big data systems that rely on S3 as storage ask users to partition their data based on the fields in their data.

Data partitioning helps Big Data systems such as Hive to scan only relevant data when a query is performed. Partitioning of data simply means to create sub-folders for the fields of data. For example, partitioning all users data based on their year and month of joining will create a folder, s3://my-bucket/users/date_joined=2015-03/ or more generically s3://my-bucket/users/date_joined=YYYY-MM/.

Hevo allows you to create data partitions for all file storage-based Destinations on the schema mapper page. To get started, just select the Event Type and in the page that appears, use the option to create the data partition.

event type selection

Partition Preview: A partition preview of where your files will be saved.

Prefix: The prefix folder name in S3

Partition Keys: You need to select the Event field you want to partition data on. Based on the field type, such as, Date, Time or Timestamp, you can choose the appropriate format for it.

Once you are done setting up the partition key, click on create mapping and data will be saved to that particular location.

Let’s take an example of your app users. You would have users’ name, date_of_birth, gender, location attributes available and want to write the data in to s3://my-bucket/app_users/date_of_birth=YYYY-MM/location=<value>/ location. We will first set prefix as app_users. Later, in the partition keys, we will select date_of_birth and select YYYY-MM as the format. Then we will click Add Partition Key and select location as the partition key. Finally, we will click Create Mapping on the top of the screen to save the selection.

Last updated on 09 Nov 2020