Parsing Nested JSON Fields in Events
ON THIS PAGE
To load JSON-formatted Source data to the Destination, you must select how Hevo must parse the nested fields, objects, and arrays so that these are read correctly. The parsing strategy depends upon the Destination type and is independent of the Pipeline mode or the data Source.
You can select the parsing strategy as the last step of creating your Pipeline. If only one strategy is available, it is applied by default and no selection is required, as in the case of Aurora MySQL, MySQL, Postgres, and Snowflake Destinations.
The table below lists the parsing strategies that Hevo supports for each Destination, the first option being the default one in each case.
Destinations | Parsing Strategies |
---|---|
Amazon Redshift, S3, MS SQL Server | - Flatten structs and split arrays to new events - Replicate JSON fields as JSON strings |
Aurora MySQL, MySQL, Postgres, Snowflake | - Replicate JSON fields to JSON columns |
Google BigQuery | - Replicate JSON structure as such while collapsing arrays into strings - Replicate JSON fields as JSON strings |
The following example explains each strategy.
Example
Consider a sample orgs
object that contains multiple fields, child objects, and arrays in different nesting orders. Its format in different Sources is illustrated below. The example explores how this data is replicated based on the parsing strategy that is selected.
Sample Data Structure 1
The following is the orgs
object as it appears in MongoDB:
Sample Data Structure 2
The following is the orgs
object as it appears in Sources such as MySQL and PostgreSQL:
Let us see how the replicated data appears in the Destination based on the different parsing strategies.
Strategy: Replicate JSON fields as JSON strings
In this strategy, nested JSON fields are collapsed and serialized as a JSON string of type varchar
.
In the current example, the address
object, which contains several fields and an array, is collapsed into a single JSON string in the Destination table. Same behaviour can be observed for the contact
array, where the name
and phone_no
fields are collapsed into a single JSON string.
Sample Destination Record
Strategy: Replicate JSON structure as such while collapsing nested arrays into strings
In this parsing strategy, the JSON structure of the Source data is maintained except for nested arrays, which are collapsed into JSON strings.
This option is used only in case of Google BigQuery, as it offers specialized support for JSON and also enforces the need for a well-defined schema for the fields within a record.
Note: If the Source data has a nested array of arrays or a mix of data types in the array items, such arrays are collapsed into a JSON string. Read Working with Arrays.
Sample Destination Record
As seen in the image below, the Source JSON format is retained for all the fields and well-defined arrays.
Now, suppose the following record is added to the Source data:
Here, the contact
array within the address
object has differing counts of phone numbers. Such non-uniform arrays are collapsed into a JSON string, as shown below.
Sample Destination Record
Strategy: Flatten structs and split arrays to new Events
Each JSON object/array is ingested as a separate Event Type, and a new table is created for it in the Destination. The child Event carries the property __hevo_ref_id
, representing the identifier of the parent object. In addition, the parent object name is prefixed to the Event name. In this structure, each key:value
pair holds only one value.
Note: In case the Source table does not have a primary key, then this strategy is not used but the parsing defaults to “Collapse” strategy. This is done because there is no primary key to reference in the new Event Types.
The following image illustrates the tables that are created in the Destination for the orgs
Event Type and its child arrays:
-
In the
flat_orgs
table:-
Each field of the
orgs
object is parsed as a separate Event in a flat JSON format. -
The fields within the
address
object are converted to separate Events, and the parent field name is prefixed to each Event’s name, for example,address.street
. Here, the_id
field, which is the same as the primary key of the parent field, maps the child to the parent. -
The
coord
array within theaddress
object, and thecontact
array are not seen here, as these are converted into new Event Types.
-
-
In the
flat_orgs_address_coord
table created for thecoord
array:-
The
__hevo_ref_id
is mapped to the primary key (__hevo_ref_id
field) of thecontact
object in the Source. -
The sequence of the array items is maintained using the
__hevo_array_index
. -
The combination of
__hevo_array_index
and_id
becomes the primary key of the Event.
-
-
Similarly, a separate table
flat_orgs_contact
is created for thecontact
array within theaddress
object.
Sample Destination Record
The following image displays a sample of the replicated data for the flat_orgs
object:
The following image displays a sample of the replicated data for the flat_orgs_contact
object:
Strategy: Replicate JSON fields to JSON columns
While this format appears similar to the Replicate JSON Fields as JSON Strings parsing strategy, additional metadata is maintained by the system to differentiate the JSON and String data. This metadata is useful while writing the data to appropriate columns (optimized) in the tables of the Destination warehouse where both JSON and String data types are supported.
Sample Destination Record
As seen in the image below, the address
and contact
objects are replicated as separate JSONs.
Revision History
Refer to the following table for the list of key updates made to the page:
Date | Release | Description of Change |
---|---|---|
09-Mar-2021 | 1.58 | - Updated the page overview and the parsing strategies table to indicate that only native parsing strategy is supported for the Aurora MySQL, MySQL, Postgres, and Snowflake Destinations. - Included the screenshot where you can specify the parsing strategy. |