Splitting Nested Events into Multiple Events

Last updated on May 30, 2023
On This Page

Many modern databases allow storing nested or JSON documents which you may want to normalize or split into different Events before loading into your data warehouse.

A prominent example of such databases is MongoDB. In this article, we will see how to split nested documents being ingested from a MongoDB collection into multiple Events.

Let’s take the following input Event ingested from a MongoDB collection named restaurants as an example:

{
    "_id": "5a9909384309cf90b5739342",
    "name": "Mangal Kebab Turkish Restaurant",
    "restaurant_id": "41009112",
    "borough": "Queens",
    "cuisine": "Turkish",
    "address": {
        "building": "4620",
        "coord": {
            "0": -73.9180155,
            "1": 40.7427742
        },
        "street": "Queens Boulevard",
        "zipcode": "11104"
    },
    "grades": {
        "0": {
            "date": 1414540800000,
            "grade": "A",
            "score": 12
        },
        "1": {
            "date": 1397692800000,
            "grade": "A",
            "score": 10
        },
        "2": {
            "date": 1381276800000,
            "grade": "A",
            "score": 12
        }
    }
}

The following Transformations code snippet will split the above Event into three Events restaurants, restaurant\_address, and restaurant\_grades:

def transform(event):
    events = [event]

    # Get event name from the event #
    eventName = event.getEventName()

    # Get properties from the event #
    properties = event.getProperties()

    # Split events

    # Create address
    address = properties.get('address')
    address['ref_id'] = properties.get('restaurant_id')
    events.append(Event('restaurant_address', address))
    del properties['address']

    # Create grades
    properties['grades'] = eval(properties['grades'])
    grades_list = properties.get('grades')
    for i in range(len(grades_list)):
        grades = grades_list[i]
        grades['ref_id'] = properties.get('restaurant_id')
        grades['index'] = i
        events.append(Event('restaurant_grades', grades))
    del properties['grades']

    return events

The output from the above snippet will be:

[
	{
		"event_name": "demo.restaurants",
	    	"properties": {
		    "restaurant_id": "41009112",
			"name": "Mangal Kebab Turkish Restaurant",
			"cuisine": "Turkish",
			"_id": "5a9909384309cf90b5739342",
			"borough": "Queens"
		}
	},
	{
		"event_name": "restaurant_address",
		"properties": {
			"zipcode": "11104",
			"ref_id": "41009112",
			"coord": {
				"0": -73.9180155,
				"1": 40.7427742
			},
			"street": "Queens Boulevard",
			"building": "4620"
		}
	},
	{
		"event_name": "restaurant_grades",
		"properties": {
			"date": 1414540800000,
			"ref_id": "41009112",
			"score": 12,
			"grade": "A",
			"index": "0"
		}
	},
	{
		"event_name": "restaurant_grades",
		"properties": {
			"date": 1397692800000,
			"ref_id": "41009112",
			"score": 10,
			"grade": "A",
			"index": "1"
		}
	},
	{
		"event_name": "restaurant_grades",
		"properties": {
			"date": 1381276800000,
			"ref_id": "41009112",
			"score": 12,
			"grade": "A",
			"index": "2"
		}
	}
]

As you can see above, one restaurant\_address Event was created from the parent Event and three restaurant\_grades Events were created as the parent Event had an array of three such nested objects.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Oct-25-2022 NA Moved and renamed page from Example - Splitting Nested Events into Multiple Events to Splitting Nested Events into Multiple Events.

Tell us what went wrong