Table and Column Name Compression

Some of the popular Destination systems impose the following limits on the length of table and column names:

Destination Table Name Character Limit Column Name Character Limit
MySQL 64 64
PostgreSQL 63 63
Redshift 127 127
BigQuery 1024 128
Snowflake 255 255
MSSQL Server 128 128

This means that if the field or object name exceeds the limit set by the Destination, the extra characters are truncated during replication. This can lead to duplicate Destination tables and column names getting created, finally resulting in a replication error.

For example, consider the following nested JSON objects ingested from a MongoDB Source with parsing methods as Flatten structs and split arrays to new events:

  • organization_vertical_geography_zone_country_department_contact_home_phone (74 chr)

  • organization_vertical_geography_zone_country_department_contact_home_address (76 chr)

For a Postgres Destination, the tables created for these objects would end up having the same name due to the 63 character limit:

  • organization_vertical_geography_zone_country_department_contact, and
  • organization_vertical_geography_zone_country_department_contact.

To prevent such situations, Hevo uses name compression strategies to derive unique names for the tables and columns.

Name Compression Strategies

The following strategies take into account the different ways in which the Source fields and objects may have been named and try to retain the most relevant information within the specified character limit.

For the sample values below, a limit of 40 characters is assumed to derive the compressed names.

Name Compression Strategy
(in order of priority)
Original Name Compressed Name Notes
Keep the last n characters this_table_name_is_too_long_andTheLastWordIsLongerThanTheLimitByItself dTheLastWordIsLongerThanTheLimitByItself Useful when words are not separated and the name is a continuous string. This method counts only characters and does not recognize words
Abbreviate using first 2 letters only until necessary this_table_name_has_so_many_words_that_abbreviating_might_help_shorten_it th_ta_na_ha_so_ma_wo_th_ab_mi_he_sh_it Useful when words are separated by an _ or a delimiter.
Remove words from the beginning this_table_name_is_too_long_and_has_too_many_words_that_abbreviating_would_still_not_ensure_it_is_within_the_limit still_not_ensure_it_is_within_the_limit Whole words are removed till necessary.

Scenarios that Trigger Name Compression

  • The Destination prefix is long. Hevo imposes a limit of 32 characters on the prefix length. As the Destination prefix is not compressed while creating the Destination table, it can lead to the Truncate strategy being applied on the table name.

  • You have not enabled Merge Tables option. As a result, the Event Type name = database_name.table_name. This may lead to the character limit getting exceeded.

  • The Source has JSON data, and the strategy selected for parsing of JSON Events results in generating long table and column names.

  • The Source supports longer table and column names than the Destination. For example, Source is Amazon Redshift (127 char) and Destination is Postgres (63 char).

Note: Your selection of the Name Sanitization feature may affect the compression strategy being applied to the table name.

Example: Destination Schema with Table Compression

Let us consider:

  • Source Type: Amazon Redshift
  • Source Database Name: table_name_compression
  • Destination Prefix: south_west_zone_retail_sales_div
  • Source schema:
    Redshift Source tables

  • Destination Type: Postgres
  • Destination Database Name: table_name_compression_dest
  • Sanitize Tables: True

In this scenario, the Destination schema created after table name compression is as follows:

Destination schema with compressed table names

Last updated on 16 Oct 2020