Example - Merge Collections Feature

Last updated on Aug 22, 2022

The Merge Collections toggle option is available during Pipeline creation, in the Configure Your Source page, Advance Settings. You can use this feature to merge collections in different databases of your MongoDB Source into a single Destination table. This example illustrates the behavior of this feature.

Let us consider three databases in the MongoDB Source: android, movie, and web. Each of these databases has a collection with the name: users. All three collections have the same fields, namely, Age, First_Name, Last_Name, Email, and Phone. In addition, the web.users collection has an extra field: DOB. The images below provide a snapshot of each collection:

Collection: android.users

The 'users' collection in the 'android' database

Collection: movie.users

The 'users' collection in the 'movie' database

Collection: web.users, with the additional DOB field

The 'users' collection in the 'web' database with the extra DOB field

If Merge Collections Option is Enabled

The Schema Mapper creates a single Destination table for the three collections in the Source:

Schema mapping for the three collections into a single table

The columns in the table are the union of all the fields in the three collections in the Source. Therefore, the DOB field that is present only in the web.users collection is also added as a column in the Destination table:

Mapping summary for the merged collections

When the Pipeline runs, the data is ingested from each of these collections into the common Destination table:

Jobs to load data from each collection

The Destination table, therefore, has fields from the three collections:

Destination table containing records from all three collections

The following is a snapshot of the table data merged from all the collections:

Snapshot of data in the Destination table for merged collections

If Merge Collections Option is Not Enabled

The Schema Mapper maps each collection to a separate Destination table:

Schema map with separate table for each collections

The schema map for each collection is derived from the fields in the respective collection. For instance, the following is the mapping summary of the web.users collection:

Mapping Summary of the web.users collection

When the Pipeline runs, three separate tables are created in the Destination for the three collections. For example, the following table is created for the android.users collection:

Destination table for the android.users collection without Merge Collections enabled

This concludes the illustration of how you can use the Merge Collections feature to collate data from multiple MongoDB databases.

Tell us what went wrong