Name Sanitization

In order to provide consistent user experience across many warehouses, Hevo uses a Name Sanitizer system. It applies to the names of the table and columns created in the Hevo ecosystem. It does this by encouraging the use of a simple, consistent, readable vocabulary when naming the tables and columns. To achieve that, Hevo’s Name Sanitizer removes all non-alphanumeric characters, spaces in between the names and replaces them with rather a suitable character, underscore. We’ll go deep into how it works with each warehouse in particular. Before that let us understand when it is applied to these names.

When are Names Sanitized?

  • Names are sanitized while mapping Source events to a table in the warehouse via Hevo AutoMapper or when a user tries to create a table manually using Hevo UI, the name is validated before actually trying to create the table in the warehouse. If the validation fails, a proper message is shown on the Hevo UI.

  • There is a provision to switch off name sanitization in the case it is not required. You can switch it off while creating a Destination on Hevo UI. However, when it is switched off and auto-mapping is enabled, force sanitization comes into effect(only applicable to AWS Redshift).

  • You’ll find examples below with specific warehouse behavior.

How are Names Sanitized?

In this section, we’ll look into how Hevo’s Name Sanitizer performs with each warehouse.

AWS Redshift

In the case where Hevo AutoMapper is trying to create a table in AWS Redshift, the name sanitizer converts the table name into lowercase, replaces all non-alpha numeric characters with an underscore, and removes the trailing underscores. For example, if the Source Event Type name is _Table$namE_05_, it is converted to _table_name_05, replacing the special character $ with an underscore and removing trailing underscore.

However, when a user tries to create the table manually, name sanitization does not apply and it is converted to table$name_05_, making it to lowercase. When a Redshift Destination is configured with sanitization being switched off, it doesn’t do anything apart from making it in lowercase. The table, in this case, is created with the name table$name_05_. This is important to Redshift because a table can’t be created with upper case characters by default. You can check their Developer Guide to know more about naming database objects. The same behavior is followed for columns.

Google BigQuery

In the case where Hevo AutoMapper is trying to create a table in Google BigQuery, the sanitizer will convert the table name into lowercase, replace all non-alpha numeric characters with an underscore and remove the trailing underscores. For example, if the Source Event Type name is Table$namE_05_ it is sanitized to table_name_05, replacing the special character $ with an underscore and removing trailing underscore. According to the Google BigQuery reference guide for naming tables, it is not possible to create a table name with special characters. Hevo prompts with an error message when a table with the name, Table$namE_05_ is tried manually on Hevo UI. The same behavior is followed for columns.

Snowflake

Hevo’s Name Sanitizer for Snowflake is designed in such a fashion that the Hevo AutoMapper creates all the tables and columns in uppercase, replacing special characters with an underscore and removing trailing underscores. For example, if the Source Event Type name is Table$namE_05_ it is sanitized to TABLE_NAME_05, replacing the special character $ with an underscore and removing trailing underscore. However, when it is tried manually a table/column will be created in uppercase.

MySQL/ PostgreSQL/ MS-SQL/ SQL Server/ AWS Aurora

When name sanitization is switched ON, Hevo’s Name Sanitizer replaces all non-alphanumeric characters with underscores, removes the trailing underscores, and converts it to lower case. For example, if the Source Event Type name is Table$namE_05_ it is converted to table_name_05, replacing the non-alphanumeric character $ with an underscore and removing trailing underscore.

When name sanitization is switched OFF for the Destination, Hevo’s Name Sanitizer doesn’t affect naming and it doesn’t replace the non-alphanumeric character or the trailing underscores, and a table with name, Table__$namE_05_ is created. However, when it is tried manually, a table with name, table$name_05_ will be created.

What Happens to Field Separators and Delimiters?

If a . is used as a field separator, for example, first.list, this becomes first_list post sanitization, and is read as two words by Hevo, first and list. This is an important consideration for deciding the table name compression strategy. If name sanitization is disabled, first.list is read as one word since the compression strategy does not recognize . as a separator.

Treatment of delimiter

Read Table and Column Name Compression.

Last updated on 16 Oct 2020