Deduplicating Data in a Data Warehouse Destination

Last updated on Jan 09, 2024

As a part of loading the data from your Source system to the Destination warehouse, deduplication is done to ensure that only unique records are replicated and duplicates are dropped.

Duplicate records may occur in the Destination table in the following scenarios:

  • When there is no primary key in the Destination table to carry out the deduplication of records.
    To avoid duplication in such cases, you must set up a primary key and re-run the ingestion for that object from the Pipeline Overview tab of the Pipeline Detailed View.

  • When the Append Rows on Update setting is enabled for the table.
    Read the section below for more information on enabling or disabling this option.


Enabling or Disabling Append Rows on Update Setting

The Append Rows on Update option within a Destination table can be enabled or disabled depending on whether you want the ingested Events to be directly appended as new rows without deduplication or be checked for duplicates. You can specify this setting for each table.

Note: This feature is available only for Amazon Redshift, Google BigQuery, and Snowflake data warehouse Destinations. For RDBMS Destinations such as Aurora MySQL, MySQL, PostgreSQL, and SQL Server, deduplication is always done.

In the Destination Detailed View page:

  1. Click the More (More) icon next to the table name in the Destination Tables List.
  2. Update the Append Rows on Update option, as required:

    Modify Append rows on update setting for a table

    Option setting Description
    Enabled Events are appended as new rows without deduplication.
    Disabled Events are deduplicated.
  3. Click OK, GOT IT in the confirmation dialog to apply this setting.

Note: If you disable this feature after having previously enabled it, uniqueness is ensured only for future records uploaded to the Destination. Therefore, both old and new versions of the same record may exist in your Destination tables.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jan-09-2024 NA Created as a new document.

Tell us what went wrong