Parsing Nested JSON Fields in Events

To load JSON formatted Source data to a Destination, you must select how the nested fields, objects, and arrays must be parsed in order to be read correctly. The parsing method depends upon the Destination type and is independent of the Pipeline Mode or the data Source.

The table below lists the parsing methods that Hevo supports for each Destination, the first option being the default one in each case.

Destination Parsing Method
Amazon Redshift, S3, MS SQL Server - Flatten structs and split arrays to new events
- Replicate JSON fields as JSON strings
Aurora MySQL, MySQL, Postgres, Snowflake - Replicate JSON fields to JSON columns
- Flatten structs and split arrays to new events
Google BigQuery - Replicate JSON structure as such while collapsing arrays into strings
- Replicate JSON fields as JSON strings

The following example explains each parsing method.

Example

Consider a sample orgs object that contains multiple fields, child objects, and arrays in different nesting orders. Its format in different Sources is illustrated below. The example explores how this data is replicated based on the parsing method that is selected.

Sample Data Structure 1

The following is the orgs object as it appears in MongoDB and most of the SaaS and webhook Sources:

Sample JSON data from MongoDB

Sample Data Structure 2

The following is the orgs object as it appears in Sources such as MySQL and PostGres:

Sample JSON data from MySQL

Let us see how the replicated data appears in the Destination based on the different parsing methods.

Method: Replicate JSON fields as JSON strings

In this method, nested JSON fields are collapsed and serialized as a JSON string of type varchar.

In the current example, the address object, which contains several fields and an array, is collapsed into a single JSON string in the Destination table. Same behaviour can be observed for the contact array, where the name and phone_no fields are collapsed into a single JSON string.

Sample Destination Record

String output

Method: Replicate JSON structure as such while collapsing arrays into strings

In this method, the JSON structure of the Source data is maintained except for arrays, which are collapsed into JSON strings.

This option is used only in case of Google BigQuery, as it offers specialized support for JSON and also enforces the need for a well-defined schema for the fields within a record.

Note: If the Source data has a nested array of arrays or a mix of data types in the array items, such arrays are collapsed into a JSON string. Read Working with Arrays.

Sample Destination Record

As seen in the image below, the Source JSON format is retained for all the fields and well-defined arrays.

Arays collapsed into strings in BigQuery

Now, suppose the following record is added to the Source data:

Source data with mixed arrays

Here, the contact array within the address object has differing counts of phone numbers. Such non-uniform arrays are collapsed into a JSON string, as shown below.

Sample Destination Record

Non-uniform arrays collapsed into JSON string

Method: Flatten structs and split arrays to new events

Each JSON object/array is ingested as a separate Event Type, and a new table is created for it in the Destination. The child Event carries the property __hevo_ref_id, representing the identifier of the parent object. In addition, the parent object name is prefixed to the Event name. In this structure, each key:value pair holds only one value.

Note: In case the Source table does not have a primary key, then this strategy is not used but the parsing defaults to “Collapse” strategy. This is done because there is no primary key to reference in the new Event Types.

The following image illustrates the tables that are created in the Destination for the orgs Event Type and its child arrays:

Different tables created for each object in the Source

  • In the flat_orgs table:

    • Each field of the orgs object is parsed as a separate Event in a flat JSON format.

    • The fields within the address object are converted to separate Events, and the parent field name is prefixed to each Event’s name, for example, address.street. Here, the _id field, which is the same as the primary key of the parent field, maps the child to the parent.

    • The coord array within the address object, and the contact array are not seen here, as these are converted into new Event Types.

  • In the flat_orgs_address_coord table created for the coord array:

    • The __hevo_ref_id is mapped to the primary key (__hevo_ref_id field) of the contact object in the Source.

    • The sequence of the array items is maintained using the __hevo_array_index.

    • The combination of __hevo_array_index and _id becomes the primary key of the Event.

  • Similarly, a separate table flat_orgs_contact is created for the contact array within the address object.

Sample Destination Record

The following image displays a sample of the replicated data for the flat_orgs object:

Flat table with only fields

The following image displays a sample of the replicated data for the flat_orgs_contact object:

Flattened array as new table in Destination

Method: Replicate JSON fields to JSON columns

While this format appears similar to the Replicate JSON Fields as JSON Strings method, additional metadata is maintained by the system to differentiate the JSON and String data. This metadata is useful while writing the data to appropriate columns (optimized) in the tables of the Destination warehouse where both JSON and String data types are supported.

Sample Destination Record

As seen in the image below, the address and contact objects are replicated as separate JSONs.
JSON fields as JSON columns

Last updated on 11 Sep 2020