MongoDB

MongoDB is a document-oriented NoSQL database that supports high volume data storage. MongoDB makes use of collections and documents instead of tables and rows to organize data. Collections, equivalent of tables in an RDBMS, hold sets of documents. Documents, or objects, are similar to records of an RDBMS. Insert, update, and delete operations can be performed on a collection within MongoDB.

Hevo supports two ways of configuring MongoDB:

  • MongoDB Atlas - This option is applicable if your MongoDB database is hosted on MongoDB Atlas.

  • Generic MongoDB - This option is applicable for all deployments of MongoDB other than MongoDB Atlas.

Hevo connects to your MongoDB database using password authentication.

The following table lists the differences between the two:

Generic MongoDB MongoDB Atlas
Managed by the user or a third-party other than MongoDB Atlas. Fully managed database service from MongoDB.

MongoDB Configurations

MongoDB supports three configurations:

  • Standalone without replicas: Includes a single instance of MongoDB.

    Note: Hevo currently does not support this configuration.

  • Standalone with replicas: Includes one primary instance of MongoDB. Secondary instances follow the primary one by replicating data from it. While configuring MongoDB as a Source, you must provide a comma separated list of all the instances in the Database Host field.

  • Clustered: Includes three components - a shard router (mongos), a config server (mongod), and a data shard (mongod). The data shard can individually have replicas for redundancy. While configuring MongoDB as a Source, you must provide a comma separated list of all instances of the MongoDB Routers in the Database Host field.


Prerequisites

  • MongoDB version 4.0 or above, if the Pipeline mode is Change Streams. OpLog is compatible with all versions of MongoDB.

    Use the following command to find out the MongoDB version on Ubuntu.

    ~$ mongod --version

  • A MongoDB user with read access to the database that is to be replicated and to the local database.

  • A retention period of at least 72 hours or more in the OpLog. Read OpLog Retention Period.


Perform the following steps to configure MongoDB as a Source in Hevo:

Select the Source Type

To select MongoDB as the Source:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select the MongoDB variant.


Select the Pipeline Mode

Select how you want Hevo to read your data from the MongoDB Source:

  • OpLog: Data is polled using MongoDB’s OpLog. The OpLog is a collection of individual, transaction-level details which help replicas sync data from the primary instance.

    Note: OpLogs are present in data/standalone primary instances and replicas.

  • Change Streams: MongoDB’s Change Streams enable applications to stream real-time data changes without the complexity and risk of tailing the OpLog, for a single collection, a database, or an entire deployment. Change Streams are supported for all MongoDB configurations. However, for the clustered configuration, Change Streams works only if set up against a shard router (mongos).


Select the MongoDB Variant

Select the MongoDB service provider that you use to manage your MongoDB databases:

  • Generic Mongo Database: Database management is done at your end, or by a service provider other than MongoDB Atlas.

  • MongoDB Atlas: The managed database service from MongoDB.


Specify MongoDB Connection Settings

Refer to the following sections based on your MongoDB deployment:


OpLog Retention Period

The OpLog Retention Period is the duration for which Events are held in the OpLog. If an Event is not read within that period, then it is lost.

This may happen if:

  • The OpLog is full, and the database has started discarding the older Event entries to write the newer ones.

  • The timestamp of the Event is older than the OpLog retention time.

The OpLog Retention Period directly impacts the OpLog size that you must maintain to hold the entries.

To enable Hevo to read the Events without losing them, you need to maintain an adequate OpLog size or retention period. If either of these is insufficient, Hevo alerts you about the same. For more information, read OpLog Alerts.


OpLog Alerts

If the OpLog retention period is set to less than 24 hours, and if the OpLog is almost full, Hevo displays a warning as shown below:

OpLog Alert

The OpLog window is calculated as the difference between the first and last time of entry in the OpLog. To prevent the likelihood of Events not being read from the OpLog, you must set the OpLog window to at least 24 hours.

We recommend having enough space to retain the OpLog for 72 hours. This is to avoid disruption in replication due to a sudden influx of Events from the OpLog.

In case the OpLog expires, Hevo displays the error message as shown below:

OpLog Expiry error

To resolve this error, see section Resolving Pipeline failure due to OpLog Expiry below.


Resolving Pipeline Failure due to OpLog Expiry

In case of Pipeline failures due to OpLog expiry, Hevo displays the following error in the Pipelines Detailed View:

FIX NOW

To resolve the error:

  1. In the error banner, click FIX NOW and do one of the following:

    • Enable the Run Historical Load option and select the objects you want to restart the historical load for and read the OpLog from the latest available log files. This allows you to recover all the Events lost due to the expired BinLog as Hevo ingests them as historical data. Once all the historical data is read, Hevo starts reading the latest available log files.

    • Disable the Run Historical Load option and read the OpLog from the latest available log files. Choosing this option means all the Events lost due to the expired log files are not recovered. Hevo skips those Events and starts reading from the latest available log files.

      Fix Change Stream

  2. Click OK, FIX IT.

Hevo recommends extending the OpLog retention period to avoid such failures in the future.


Extending the OpLog Retention Period

Perform the following steps to extend the OpLog retention period. We recommend a retention period of 72 hours.

Note: To extend the OpLog Retention Period, you must have access to the admin database.

  1. Do one of the following:

  2. In the Pipeline Detailed View, click the kebab menu in the Summary Bar.

  3. Click Change Position.

    Change Position for OpLog

  4. Update the position to the current data and time.

    Current timestamp for OpLog

  5. Click UPDATE.

  6. In the Objects list, click the kebab menu and then, Restart Historical Load for the required objects.

    Restart Historical Load

    Alternatively, select all the objects and restart the historical load for them.

    Restart all historical loads

    Once the historical load completes, all the data until the current position date and time that you updated in Step 4 above should become available.

Schema Mapper and Primary Keys

Hevo uses the following primary keys to upload the records in the Destination:


Source Considerations

  • MongoDB does not support null values for the _id field.

    The _id field in a MongoDB document serves as its primary key. Therefore, commands that use _id as a parameter, such as commands to fetch, sort, or update data, do not run successfully if you provide a null value in _id field.

    For example, when you run the following command in MongoDB to select and sort data according to their _id values, you get a Null Pointer exception while fetching the document if the _id field does not hold a value:

     db.collection.aggregate({ $group : { _id : {$type:"$_id"}, type: {$min:"$_id"} } } )
    

See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-09-2021 1.75 - Added the section, Source Considerations to explain that MongoDB does not support null values for the _id field.
Sep-09-2021 1.71 - Renamed and updated the section, Resolving Pipeline failure due to OpLog Expiry.
- Updated the section, Oplog Alerts with a new error message.
Aug-06-2021 1.69 - Added sections, OpLog Retention Period and Resolving OpLog Expiry Failures.
- Updated section, OpLog Alerts with the new warning message.
- Added a note in section, Extending OpLog Retention Period that access to the admin database is needed to extend the OpLog retention period.
Jun-28-2021 1.66 - Added section, Schema Mapper and Primary Keys.
May-19-2021 NA - Added section, Extending the OpLog Retention Period.
- Updated section, OpLog Alerts.
Last updated on 19 Nov 2021