How do I set a field as a primary key to avoid duplication?

Last updated on Jun 27, 2023

You can set a field as a primary key in the Destination table through either Transformations or Schema Mapper. The Transformation method works only for warehouse Destinations and not for database and Firebolt Destinations. In the Schema Mapper, you need to drop the existing Destination table and create a new table with the desired primary keys.

  • Using Python Transformations. This method works only for warehouse Destinations and not for database and Firebolt Destinations.

  • Using Schema Mapper. This method works for all Destinations. In this method, you can set a primary key for your Destination table, or Event Type, directly from the Mapping Summary page. In case you want to change an existing primary key, you need to drop the existing Destination table and create a new table with the desired primary keys. Read How can I make sure that each record is loaded only once? for the steps to do this.

Define a primary key through Python Transformations

  1. Navigate to the Pipeline for which you want to define the primary key.

  2. In the left navigation pane, click Transformations to go to the Python-based Transformation interface.

    Python-based Transformation interface

  3. Write the following script in the CODE panel:

     from io.hevo.api import Event
    
     def transform(event):
        
     eventName = event.getEventName()
        
         if eventName == "<country>":
             event.setPrimaryKeyFields(['<name>', '<continent>'])
             properties = event.getProperties()
             properties['new_pk_fields'] = event.getPrimaryKeyFields()
    
         return event
    

    Here, the fields name and continent are set as the primary key for the country Event Type.

    NOTE:

    • Replace the placeholder values in the script with the fields you need.

    • Nullable columns cannot be marked as primary key in Redshift Destination.

  4. Click DEPLOY to apply the Transformation.

    Deploy the Transformation

  5. In the Schema Mapper page of your Pipeline, click the Event Type for which you deployed the Transformation.

    Select Event Type

  6. In the Event Types section, click the More ( More icon ) icon corresponding to the Event Type for which you added the primary key and disable the Auto Mapping option.

    Disable Auto Mapping

  7. In the Mapping Summary section, click the drop-down next to the Destination table name and click Drop Table.

    Drop table

  8. In the Event Types section, enable the Auto Mapping option for the Event Type.

    Enable Auto Mapping

  9. Navigate to the Pipeline Overview page, and click the More ( More icon ) icon corresponding to the Event Type for which you dropped the Destination table, and click Restart Object.

    Restart Object

    NOTE:

    • The re-ingested data counts towards your Events quota consumption and is billed.

    • You may get duplicate records in the Destination table; you need to remove them manually.

Define a primary key through Schema Mapper

  1. Navigate to the Schema Mapper page of your Pipeline.

  2. Click on the Event Type for which you want to set the primary key.

    Schema Mapper

  3. In the Mapping Summary section, click SET PRIMARY KEY.

    Set PK

  4. In the SELECT FIELDS FOR PRIMARY KEY pop-up window, select the field that you want to set as the primary key.
    You can also add multiple fields to create a composite key by clicking ADD MORE+.

    Add Primary key window

  5. Click ADD PRIMARY KEY.

    Add PK

    You can view the updated primary key in the Mapping Summary page of the Pipeline.

    Updated PK


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jun-26-2023 NA Updated the page to add information about dropping the table after setting the primary key.
Apr-25-2023 NA Updated the document to add information about setting the primary key via Schema Mapper.
Nov-07-2022 NA Created as a new document.

Tell us what went wrong