Pipeline Schema Management

Last updated on Sep 27, 2024

Edge Pipeline is currently available under Early Access. You can request access to evaluate and test its features.

Once you have configured the Pipeline, Hevo maps the Source schema to the Destination tables and columns. Read Schema Evolution Management to understand how Hevo handles changes in the Source schema and replicates it to the Destination. As part of schema management, Hevo automatically does the following:

  • Compresses long names of Source objects and columns: Hevo creates meaningful, shorter names to adhere to the character limit set for the table and column names in the Destination.

  • Sanitizes table and column names: Hevo performs automatic name sanitization to ensure the objects follow the naming convention supported by the Destinations.

Schema Evolution Management

Hevo stores the metadata of the Source schema in what is essentially a metastore, which is used to track the changes to the schema over time. This includes information such as the structure of tables, data types, and other metadata essential for understanding and querying the data. Once it receives the schema from the Source, the metastore monitors the evolution of the schema and replicates changes to the Destination in accordance with the Schema Evolution Policy selected by the user during Pipeline creation.

Data Type Handling

Hevo detects the column data types from the metadata of your Source system. It then replicates them to the Destination based on consistent data mappings maintained internally for each Source and Destination. Hevo has a standardized data system that defines unified internal data types, which are referred to as Hevo data types. Hevo internally maps the Source data types to these Hevo data types, which in turn are mapped to data types supported by the Destination. This two-step mapping ensures that Source data in a specific format can be converted to meet the data type expectations of the Destination. With this standardized approach, data conversion becomes predictable, enabling smooth replication regardless of the Source or Destination type.

Refer to the table below for the list of Hevo data types:

Hevo Data Types Definition
BOOLEAN Represents truth conditions, that is, True or False.
BYTE Represents 8-bit signed integer values.
SHORT Represents 16-bit signed integer values.
INT Represents 32-bit signed integer values.
LONG Represents 64-bit signed integer values.
FLOAT Represents single-precision 32-bit IEEE 754 floating-point numbers.
DOUBLE Represents double-precision 64-bit IEEE 754 floating-point number.
DECIMAL(p,s) Represents decimal values with optional precision and scale measures.
VARCHAR(N) Represents a string holding Unicode and ASCII characters with optional length N.
BYTE_ARRAY(N) Represents an array of bytes used to store binary data with optional length N.
JSON Represents key-value pairs as per standard JSON format.
XML Represents an XML document in string format.
ARRAY Represents a sequence of elements of the same Hevo data types.
DATE Represents date values without timezone information.
TIME Represents time values without timezone information.
TIME_TZ Represents time values with timezone information.
DATE_TIME Represents date and time values without timezone information.
ZONED_DATE_TIME Represents date and time values with timezone information.

For a Source field value to be loaded to the Destination, its data type must be supported by Hevo. Refer to your respective Source or Destination document to know about the supported data types and the internal mapping for each data type.

Handling of data type changes at the Source depends on the compatibility of the changes with the existing Destination data types and the data type hierarchy (see image below) defined by Hevo.

  • When the Source data type change is incompatible with the existing Destination data type:

    In this case, Hevo stops loading data for that column. You must restart the object if you want to ingest and load data from it again. Suppose the data type of a Source column changes from string to int. This is an incompatible change, and Hevo stops replicating data for that column.

  • When the data type change at the Source is compatible with the Destination and is a widening change:

    In this case, Hevo promotes the data type of the Destination column. For instance, if the Source data type changes from int to long, Hevo promotes the Destination column’s data type from int to long.
    We ensure the data type change is only to a wider category to accommodate all values without loss.

  • When the data type change at the Source is compatible with the Destination and is a narrowing change:

    In this case, Hevo does not change the Destination data type and continues loading data to the column. For example, if the Source data type changes from long to int, Hevo retains the Destination column data type as long.

The following image illustrates the data type hierarchy that Hevo uses while promoting the data types of your Destination columns:

Data Type Hierarchy

Schema Evolution Policy

The Schema Evolution Policy defines how Hevo handles any changes that occur in the Source data after the Pipeline is created. For example, changes in the Source columns, schema, or tables. This ensures that data is replicated seamlessly between the Source and Destination, as per your requirements.

You can choose from the following options:

Schema evolution policy

  • Allow all changes: Hevo syncs any compatible changes in the Source schema post-Pipeline creation with the Destination as soon as they are detected. For example, if a table is added or deleted in the Source, Hevo evolves the Destination schema accordingly. When new tables are added, historical load is triggered first, followed by incremental.

    Note: If no data is present in the newly added columns or tables at the Source, Hevo does not automatically detect these changes. This is because Hevo reads the schema changes from the Source database logs, and they do not capture this information. You need to manually refresh the schema to view such tables and columns in the Object Configuration tab.

  • Block all changes: Hevo does not sync any changes in the Source columns, schemas, and tables with the Destination.

Depending on your selection, Hevo syncs the necessary changes with the Destination. The following table explains the Pipeline behavior corresponding to the various scenarios that may occur in the Source:

Changes in the Source Hevo Behavior
Column Added Allow all changes: Hevo starts syncing the new column as soon as it is detected and inserts null values for the existing rows.
Block all changes: Hevo does not sync the new column with the Destination. However, the column will be visible on the Fields Selection page of the object and can be included for ingestion and loading.
Schema Added Allow all changes: Hevo starts syncing the tables in the new schema as soon as they are detected.
Block all changes: Hevo does not sync the new schema with the Destination. However, the tables in the new schema will be visible in the Object Configuration tab and can be included for ingestion and loading.
Table Added Allow all changes: Hevo starts syncing the new table as soon as it is detected.
Block all changes: Hevo does not sync the new table with the Destination. However, this table will be visible in the Object Configuration tab and can be included for ingestion and loading.
Column Renamed Allow all changes: Hevo creates the new (renamed) column in the Destination table, inserts null values for the existing rows, and starts loading data to the new column from the next Pipeline run. The existing Destination column is updated with null values in all future Pipeline runs.
Block all changes: Hevo does not sync the renamed column with the Destination and the existing column remains as is.
Table Renamed Allow all changes: Hevo creates the new (renamed) table at the Destination and starts loading data to it from the next Pipeline run. The existing table in the Destination remains unchanged.
Block all changes: Hevo does not sync the renamed table to the Destination. Also, the existing table in the Destination remains unchanged.
Column Deleted Allow all changes: Hevo populates the column in the Destination with null values from the next Pipeline run.
Block all changes: The existing column in the Destination remains unchanged and Hevo does not delete it.
Schema Deleted Allow all changes: The existing schema in the Destination remains unchanged and Hevo does not delete it.
Block all changes: The existing schema in the Destination remains unchanged and Hevo does not delete it.
Table Deleted Allow all changes: The existing schema in the Destination remains unchanged and Hevo does not delete it.
Block all changes: The existing table in the Destination remains unchanged and Hevo does not delete it.
Column Reordered Allow all changes: Hevo does not make any changes to the column order in the Destination.
Block all changes: Hevo does not make any changes to the column order in the Destination.
Column Data Type Updated Allow all changes: Hevo updates the data type of the column in the Destination if it is a widening change.
Read Data Type Handling to know how Hevo assigns and handles data types while replicating data.
Block all changes: Hevo does not make any changes to the column in the Destination.

Tell us what went wrong