Support for Multiple Data Types for the _id Field

On This Page

Each document in a MongoDB collection includes the _id field that serves as its primary key. MongoDB supports all data types for this field.

To support this, as of Release 1.61, Hevo has enhanced the queries run on the MongoDB Source such that _ids of all types are retrieved. However, by itself, this solution can cause documents to get missed or overwritten during ingestion and loading respectively.

For example:

  • During ingestion, the query results for relational operations on the _id field would only include documents having an _id of that type. Consider a collection having documents with _id of type String as well as Numeric. The documents retrieved for an operation, _id > 100, after sorting on _id would only be ones with a numeric _id. All the documents with a string _id would be excluded.

  • During loading to the Destination, documents would get overwritten if the value of the _id field was same even if the data type was different. Consider two documents, one with _id as numeric 1 and the other with _id as a string, β€œ1”. If either one was loaded first, the other would overwrite it.

Therefore, the enhanced queries are supported by the following changes:

  • To ensure documents are not missed during ingestion, Hevo stores the last read _id for all the types in contrast to a single _id value being stored previously.

  • To pre-empt the loading failures, Hevo uses the __hevo_id field to identify the documents to be loaded. The __hevo_id is a string value generated from the hash of the data type of the _id field and its value. For example, suppose the _id value is numeric 1. Then, __hevo_id is the SHA256 hash of β€œInteger-1”. The documents can be sorted as per their __hevo_id. The _id field still remains the primary key for the table.



Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Jun-28-2021 1.61 New document.
Last updated on 12 Oct 2021