How do I set a field as a primary key to avoid duplication?

Last updated on May 30, 2023

You can set a field as a primary key in the Destination table through either Transformations or Schema Mapper. The Transformation method works only for warehouse Destinations and not for database and Firebolt Destinations. In the Schema Mapper, you need to drop the existing Destination table and create a new table with the desired primary keys.

  • Using Python Transformations. This method works only for warehouse Destinations and not for database and Firebolt Destinations.

  • Using Schema Mapper. This method works for all Destinations. In this method, you can set a primary key for your Destination table, or Event Type, directly from the Mapping Summary page. In case you want to change an existing primary key, you need to drop the existing Destination table and create a new table with the desired primary keys. Read How can I make sure that each record is loaded only once? for the steps to do this.

Define a primary key through Python Transformations

  1. Click Transformations to go to the Python-based Transformation interface.

    Python-based Transformation interface

  2. Write the following code in the CODE panel:

     from io.hevo.api import Event
    
     def transform(event):
    
       # Set a field (or a list of fields) as the primary key
       event.setPrimaryKeyFields(['journey_id', 'key'])
    
       properties = event.getProperties()
       properties['new_pk_fields'] = event.getPrimaryKeyFields()
    
       return event
    
    

    Here, the fields journey_id and key are set as the primary key. Replace these with the fields you need.

    NOTE: Nullable columns cannot be marked as primary key in Redshift Destination.

  3. Click DEPLOY to apply the Transformation.

    Deploy the Transformation

  4. Click Reset Schema for the Event Type from the Schema Mapper page. This reloads the schema in the Destination table as per the newly defined primary key fields.

    Click Reset Schema option

The data is ingested as per the Pipeline schedule. To apply the Transformation on existing data, click Restart Object from the Pipeline Overview page. The re-ingested data counts towards your Events quota consumption and is billed.

Restart Object to apply Transformation on existing data

NOTE: You may get duplicate records in the Destination table; you need to remove them manually.

Define a primary key through Schema Mapper

  1. Navigate to the Schema Mapper page of your Pipeline.

  2. Click on the Event Type for which you want to set the primary key.

    Schema Mapper

  3. In the Mapping Summary section, click SET PRIMARY KEY.

    Set PK

  4. In the SELECT FIELDS FOR PRIMARY KEY pop-up window, select the field that you want to set as the primary key.
    You can also add multiple fields to create a composite key by clicking ADD MORE+.

    Add Primary key window

  5. Click ADD PRIMARY KEY.

    Add PK

    You can view the updated primary key in the Mapping Summary page of the Pipeline.

    Updated PK


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Apr-25-2023 NA Updated the document to add information about setting the primary key via Schema Mapper.
Nov-07-2022 NA Created as a new document.

Tell us what went wrong