How do I set a field as a primary key to avoid duplication?

Last updated on Nov 03, 2022

You can set a field as a primary key in the Destination table through either Transformations or Schema Mapper. The Transformation method works only for warehouse Destinations and not for database and Firebolt Destinations. In the Schema Mapper, you need to drop the existing Destination table and create a new table with the desired primary keys.

To set a field as a primary key through Transformations:

  1. Click Transformations to go to the Python-based Transformation interface.

    Python-based Transformation interface

  2. Write the following code in the CODE panel:

     from io.hevo.api import Event
     def transform(event):
       # Set a field (or a list of fields) as the primary key
       event.setPrimaryKeyFields(['journey_id', 'key'])
       properties = event.getProperties()
       properties['new_pk_fields'] = event.getPrimaryKeyFields()
       return event

    Here, the fields journey_id and key are set as the primary key. Replace these with the fields you need.

    NOTE: Nullable columns cannot be marked as primary key in Redshift Destination.

  3. Click DEPLOY to apply the Transformation.

    Deploy the Transformation

  4. Click Reset Schema for the Event Type from the Schema Mapper page. This reloads the schema in the Destination table as per the newly defined primary key fields.

    Click Reset Schema option

The data is ingested as per the Pipeline schedule. To apply the Transformation on existing data, click Restart Object from the Pipeline Overview page. The re-ingested data counts towards your Events quota consumption and is billed.

Restart Object to apply Transformation on existing data

NOTE: You may get duplicate records in the Destination table; you need to remove them manually.

To set a field as a primary key through the Schema Mapper:

  1. Disable Auto Mapping for the Event Type.

    Disable Auto Mapping

  2. Click the Kebab menu icon next to the Destination Table name and click Drop Table.

    Drop present Destination Table

  3. Click CREATE TABLE & MAP to create a new Destination table.

    Create a new Destination Table

  4. To set the field as a primary key, select the check box under the Primary Key column.

    Select Primary Key fields

    NOTE: Nullable columns cannot be marked as primary key in Redshift Destinations.

  5. Specify the Destination Table Name and click CREATE TABLE & MAP.

A new Destination table is created with the specified primary keys. The data is replicated into this table as per the Pipeline schedule.

Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-07-2022 NA Created as a new document.

Tell us what went wrong