MongoDB

MongoDB is a document-oriented NoSQL database that supports high volume data storage. MongoDB makes use of collections and documents instead of tables and rows to organize data. Collections, equivalent of tables in an RDBMS, hold sets of documents. Documents, or objects, are similar to records of an RDBMS. Insert, update, and delete operations can be performed on a collection within MongoDB.

Hevo supports two ways of configuring MongoDB:

  • MongoDB Atlas - This option is applicable if your MongoDB database is hosted on MongoDB Atlas.

  • Generic MongoDB - This option is applicable for all deployments of MongoDB other than MongoDB Atlas.

Hevo connects to your MongoDB database using password authentication.

The following table lists the differences between the two:

Generic MongoDB MongoDB Atlas
Managed by the user or a third-party other than MongoDB Atlas. Fully managed database service from MongoDB.

MongoDB Configurations

MongoDB supports three configurations:

  • Standalone without replicas: Includes a single instance of MongoDB.

    Note: Hevo currently does not support this configuration.

  • Standalone with replicas: Includes one primary instance of MongoDB. Secondary instances follow the primary one by replicating data from it. While configuring MongoDB as a Source, you must provide a comma separated list of all the instances in the Database Host field.

  • Clustered: Includes three components - a shard router (mongos), a config server (mongod), and a data shard (mongod). The data shard can individually have replicas for redundancy. While configuring MongoDB as a Source, you must provide a comma separated list of all instances of the MongoDB Routers in the Database Host field.


Prerequisites

  • MongoDB version 4.0 or above, if the Pipeline mode is Change Streams. OpLog is compatible with all versions of MongoDB.

    Use the following command to find out the MongoDB version on Ubuntu.

    ~$ mongod --version

  • A MongoDB user with read access to the database that is to be replicated and to the local database.

  • A retention period of at least 72 hours or more in the OpLog to ensure the OpLog does not get purged before Hevo can read it.(Recommended). Read OpLog Alerts.


Perform the following steps to configure MongoDB as a Source in Hevo:

Select the Source Type

To select MongoDB as the Source:

  1. Click PIPELINES in the Asset Palette.

  2. Click + CREATE in the Pipelines List View.

  3. In the Select Source Type page, select the MongoDB variant.


Select the Pipeline Mode

Select how you want Hevo to read your data from the MongoDB Source:

  • OpLog: Data is polled using MongoDB’s OpLog. The OpLog is a collection of individual, transaction-level details which help replicas sync data from the primary instance.

    Note: OpLogs are present in data/standalone primary instances and replicas.

  • Change Streams: MongoDB’s Change Streams enable applications to stream real-time data changes without the complexity and risk of tailing the OpLog, for a single collection, a database, or an entire deployment. Change Streams are supported for all MongoDB configurations. However, for the clustered configuration, Change Streams works only if set up against a shard router (mongos).

Read OpLog Alerts.


Select the MongoDB Variant

Select the MongoDB service provider that you use to manage your MongoDB databases:

  • Generic Mongo Database: Database management is done at your end, or by a service provider other than MongoDB Atlas.

  • MongoDB Atlas: The managed database service from MongoDB.


Specify MongoDB Connection Settings

Refer to the following sections based on your MongoDB deployment:


OpLog Alerts

The OpLog Retention Period defines the duration for which details of an Event are held in the OpLog. If the Event is not read within that time for any reason, it is lost. The retention period directly impacts the OpLog size you must maintain to hold the entries. MongoDB starts discarding the older Event entries to accommodate newer ones if the OpLog is fully consumed.

We recommend having enough space to retain the OpLog for 72 hours to avoid disruption in replication due to spikes. If the OpLog retention period is set as less than 24 hours, a one-time warning is displayed in a snack-bar in your MongoDB Pipeline, as shown below.

OpLog Alert


Extending the OpLog Retention Period

You can extend the OpLog retention period to 72 hours (recommended) using the following steps. Once you modify the retention period, you need to re-load the historical data for it starting from the failed period timestamp to the current timestamp. This allows you to load all the missing data from log expiry till the current position.

To extend the OpLog retention period:

  1. Do one of the following:

  2. In the Pipeline Detailed View, click the kebab menu in the Summary Bar.

  3. Click Change Position.

    Change Position for OpLog

  4. Update the position to the current data and time.

    Current timestamp for OpLog

  5. Click UPDATE.

  6. In the Objects list, click the kebab menu and then, Restart Historical Load for the required objects.

    Restart Historical Load

    Alternatively, select all the objects and restart the historical load for them.

    Restart all historical loads

    Once the historical load completes, all the data until the current position date and time that you updated in Step 4 above should become available.

Schema Mapper and Primary Keys

Hevo uses the following primary keys to upload the records in the Destination:


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jun-28-2021 1.66 Added section, Schema Mapper and Primary Keys.
May-19-2021 NA - Added section, Extending the OpLog Retention Period.
- Updated section, OpLog Alerts.
Last updated on 28 Jun 2021