Parsing Nested JSON Fields in Events
ON THIS PAGE
To load JSON formatted Source data to a Destination, you must select how the nested fields, objects, and arrays must be parsed in order to be read correctly. The parsing method depends upon the Destination type and is independent of the Pipeline Mode or the data Source.
The table below lists the parsing methods that Hevo supports for each Destination, the first option being the default one in each case.
Destinations | Parsing Methods |
---|---|
Amazon Redshift, S3, MS SQL Server | - Flatten structs and split arrays to new events - Replicate JSON fields as JSON strings |
Aurora MySQL, MySQL, Postgres, Snowflake | - Replicate JSON fields to JSON columns - Flatten structs and split arrays to new events |
Google BigQuery | - Replicate JSON structure as such while collapsing arrays into strings - Replicate JSON fields as JSON strings |
The following example explains each parsing method.
Example
Consider a sample orgs
object that contains multiple fields, child objects, and arrays in different nesting orders. Its format in different Sources is illustrated below. The example explores how this data is replicated based on the parsing method that is selected.
Sample Data Structure 1
The following is the orgs
object as it appears in MongoDB and most of the SaaS and webhook Sources:
Sample Data Structure 2
The following is the orgs
object as it appears in Sources such as MySQL and PostGres:
Let us see how the replicated data appears in the Destination based on the different parsing methods.
Method: Replicate JSON fields as JSON strings
In this method, nested JSON fields are collapsed and serialized as a JSON string of type varchar
.
In the current example, the address
object, which contains several fields and an array, is collapsed into a single JSON string in the Destination table. Same behaviour can be observed for the contact
array, where the name
and phone_no
fields are collapsed into a single JSON string.
Sample Destination Record
Method: Replicate JSON structure as such while collapsing arrays into strings
In this method, the JSON structure of the Source data is maintained except for arrays, which are collapsed into JSON strings.
This option is used only in case of Google BigQuery, as it offers specialized support for JSON and also enforces the need for a well-defined schema for the fields within a record.
Note: If the Source data has a nested array of arrays or a mix of data types in the array items, such arrays are collapsed into a JSON string. Read Working with Arrays.
Sample Destination Record
As seen in the image below, the Source JSON format is retained for all the fields and well-defined arrays.
Now, suppose the following record is added to the Source data:
Here, the contact
array within the address
object has differing counts of phone numbers. Such non-uniform arrays are collapsed into a JSON string, as shown below.
Sample Destination Record
Method: Flatten structs and split arrays to new events
Each JSON object/array is ingested as a separate Event Type, and a new table is created for it in the Destination. The child Event carries the property __hevo_ref_id
, representing the identifier of the parent object. In addition, the parent object name is prefixed to the Event name. In this structure, each key:value
pair holds only one value.
Note: In case the Source table does not have a primary key, then this strategy is not used but the parsing defaults to “Collapse” strategy. This is done because there is no primary key to reference in the new Event Types.
The following image illustrates the tables that are created in the Destination for the orgs
Event Type and its child arrays:
-
In the
flat_orgs
table:-
Each field of the
orgs
object is parsed as a separate Event in a flat JSON format. -
The fields within the
address
object are converted to separate Events, and the parent field name is prefixed to each Event’s name, for example,address.street
. Here, the_id
field, which is the same as the primary key of the parent field, maps the child to the parent. -
The
coord
array within theaddress
object, and thecontact
array are not seen here, as these are converted into new Event Types.
-
-
In the
flat_orgs_address_coord
table created for thecoord
array:-
The
__hevo_ref_id
is mapped to the primary key (__hevo_ref_id
field) of thecontact
object in the Source. -
The sequence of the array items is maintained using the
__hevo_array_index
. -
The combination of
__hevo_array_index
and_id
becomes the primary key of the Event.
-
-
Similarly, a separate table
flat_orgs_contact
is created for thecontact
array within theaddress
object.
Sample Destination Record
The following image displays a sample of the replicated data for the flat_orgs
object:
The following image displays a sample of the replicated data for the flat_orgs_contact
object:
Method: Replicate JSON fields to JSON columns
While this format appears similar to the Replicate JSON Fields as JSON Strings method, additional metadata is maintained by the system to differentiate the JSON and String data. This metadata is useful while writing the data to appropriate columns (optimized) in the tables of the Destination warehouse where both JSON and String data types are supported.
Sample Destination Record
As seen in the image below, the address
and contact
objects are replicated as separate JSONs.