Data Transformation

Last updated on Jun 16, 2023

Data Transformation refers to the process of converting or transforming the data ingested from your Source as per your requirement before loading it in the Destination. Using Transformations, you can structure, organize, or enrich the data. Transformations may also be applied by Hevo automatically or based on the Source and Destination configurations you specify to prepare your data for replication. These are called In-flight Transformations. Any further Transformation or extraction of data post-loading to the Destination can be done through SQL query-based Models.

In-flight Transformations

Hevo automatically transforms the Source data during ingestion or loading to offer you a consistent and predictable format of data in the Destination. Some of these in-flight Transformations are also based on the options you select while creating the Pipeline. In-flight Transformations include:

  • Table and column name sanitization. Hevo uses a simple, consistent, and readable vocabulary for naming the tables and columns, while mapping the Source Event Types to the Destination. To do this, Hevo removes all non-alphanumeric characters and spaces in a table or column name and replaces them with underscores. You can choose to not enable this feature. Read Name Sanitization.

  • Table and column name compression (automatic). Hevo uses name compression strategies to derive unique, yet relevant names for the tables and columns in order to meet the character limits imposed by different Destinations. Read Table and Column Name Compression

  • JSON parsing in Destination. As part of configuring your Destination, you can select how Hevo must parse any nested fields, objects, and arrays so that these are read correctly. The options provided on the UI are determined by the Destination type. If there is only one option possible, Hevo automatically uses that. Read Parsing Nested JSON Fields in Events.

  • Conversion of dates to timestamps in UTC format. Dates are formatted as yyyy-mm-dd and timestamps are recorded in EPOCH milliseconds during ingestion and loaded to the Destination. You can create Transformations to change the timezone, if needed. Hevo loads the timestamp data, with or without timezone info, as timestamp without timezone type, and all the values are maintained in UTC timezone. The only exception to this is the PostgreSQL Source-Redshift Destination combination, where timestamp without timezone is loaded as timestamp without timezone and timestamp with timezone is loaded as timestamp with timezone.

  • Addition of meta fields. For some ad-based Sources such as Google Ads, Hevo adds certain meta fields for completeness of data.

  • Dropping of fields. Hevo does not load fields with the following values: "NA", "N/A", "NULL", "NONE", "-". If that column is not already present in the Destination, it is not created; if it is already present, then, the values in those column will be null.

User-driven Transformations

Some common scenarios where you may find Transformations to be helpful include:

  • Cleansing: Your data may contain unclean values or values in inconsistent formats. For example, you may want to convert all booleans to _0_s and _1_s or trim special characters or spaces from values. All these operations can be done using Transformations.

  • Data enrichment: You can derive new data from existing field values. For example, you can convert the IP addresses to names of cities or zip codes and coordinates to geo-hashes. You can generate new fields based on the values of existing fields. For example, if the value of the price field is greater than $15, add a new field named discount with value = 0.15*price.

  • Re-expression: You can re-express your data in different units or formats. For example, you can convert all currency values to USD or change all dates to a particular time zone.

  • Filtering: You can filter Events that you do not want to replicate to your Destination based on simple or complex logic. For example, you can drop all Events where author_name = "shakespeare"

  • Normalization: You can break nested Events into normalized Events to remove repetitive data and organize the data logically. Let us suppose, you ingest an object called person, which contains the person’s name, address, and a list of their bank accounts. Here, the bank _accounts field may contain multiple values and you want to normalize this data for business purposes. Then, you can separate the name, address, and bank accounts and load them as three different Events.

  • Successful ingestion of failed Events: You can write Transformations to modify the properties of failed Events so that they are successfully ingested. This helps you reduce and resolve Pipeline failures.

Hevo provides you two ways to transform the data before loading it to the Destination:

  • Python Code-based Transformations. You can create Python-based Transformations or use the sample scripts provided by Hevo to modify the properties and metadata of Events, manage data fields and transform data types of the Source data.

  • Drag and Drop Transformations. You can use the UI-based Transformation blocks provided by Hevo to manage some key Transformations of the Source data.

Read Transformations.

Once you test and deploy a Transformation, it is applied to the ingested data and the updated Events are loaded into your Destination. Read Data Loading.

Note: If you plan to apply Transformations on your data before loading it to the Destination, we recommend that you pause your Pipeline immediately upon creating it. Once you have deployed the Transformation, you can resume your Pipeline. This will ensure that no raw, non-transformed data gets loaded to the Destination.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Apr-20-2022 NA Added section, In-flight Transformations.
Mar-21-2022 NA New document.

Tell us what went wrong