Hevo can replicate collections from your Mongo database. Hevo can query Oplog to replicate data or can query the collections directly.
1. Create a new Pipeline
Click on PIPELINES option in the left navigation bar and click on Create New Pipeline.
2. Select Source Type
Select MongoDB from the list on Select Source Type Screen.
3. Provide Connection Settings
Provide your Mongo database connection details on MongoDB Connection Settings page. You will have following options in connection details block:
- Source Name - A unique name for this source
- Mongo DB Host - Mongo DB host's IP address or DNS. In case you are trying to connect to a replica set, you can provide a comma-separated list of all IPs/DNS in the replica set. Hevo will always connect to a secondary.
- Mongo DB Port - The port on which your Mongo server is listening for connections (default is 27017)
- Mongo DB User - The read-only user that can read the collections in your database.
- Mongo DB Password - Password for the read-only user
- Database Name - The database that you wish to replicate
- Auth DB Name - The authentication database if applicable
- Connect through SSH - If you want to connect to Hevo using an SSH tunnel, check How to Connect through SSH. Else, you will have to whitelist Hevo's IP addresses as listed here.
- Use SSL - Select this option if you want Hevo to connect with MongoDB instance using SSL.
Alternatively, you copy details from an existing source of MongoDB type. Please note that it will create an independent copy of the selected source.
Click TEST CONNECTION to test the credentials and click CONTINUE once test succeeds.
4. Select Ingestion Mode
On this page, you will have two options for Ingestion mode. This will define how you want Hevo to read your data from MongoDB source.
- OpLog: Data will be polled using MongoDB's OpLog in this mode. This mode is useful when you are looking to replicate the complete database as it is. This mode is very efficient in replicating but leaves you with less control and manageability over data ingestion. Note that the Mongo DB user must have read access both to the database that is to be replicated and the local database. Read about the instructions to set up replication in MongoDB here.
- Change Streams: Data will be polled using MongoDB's Change Streams, like OpLog this mode is useful when you are looking to replicate the complete database as it is. This mode is very efficient in replicating but leaves you with less control and manageability over data ingestion. Note that the Mongo DB user must have read access both to the database that is to be replicated and the local database. Read about the instructions to set up replication in MongoDB here. If you face any issue while setup follow this document to troubleshoot
Select the Ingestion Mode and hit CONTINUE.
4. Provide Job Settings
You will land on this screen if you selected Collections mode on the last screen.
Here, you will be presented with the list of collections in your Mongo database. You can deselect the collection that you don't want to replicate. For every collection, you can specify an incrementing field (usually _id) and a timestamp field (which keeps the last modified timestamp for a document).
If you specify the timestamp field Hevo will be able to replicate updates happening to the documents in the collection, otherwise, Hevo will replicate only new documents being inserted through the incrementing field.
Also, it is highly advisable to have an index on the timestamp field. Not having an index might result in the Job getting timed out. Read more about indexes in MongoDB here.
After entering the details, click CONTINUE.
5. Select the Destination
Select the Destination where you want to replicate MongoDB Data or Click on NEW DESTINATION to create a new Destination. Check out How to add Destination tutorial for the detailed walkthrough on steps needed for adding new Destination.
6. Pipeline Created
Your Pipeline will be created when you enter this page and you will have an option to see Sample Data and Map Schema.
While Hevo tries to load your schemas, you can select CONTINUE IN BACKGROUNDif it is taking too much time. Click on CREATE SCHEMA MAPPING to map Source and Destination Schemas, check out Introduction to Schema Mapper to learn about Schema Mapper or you can select DO IT LATER to directly head to Pipeline page. You can map schemas later on Schema Mapper page in your pipeline.
Hevo supports only collections and change stream modes in case of sharded clusters. In case of sharded clusters, instead of a mongod server host and port, you need to give the host and port of the mongos server while setting up the connection above.
Please note that your data will not start replicating in Destination tables until you map source and Destination schemas.