Support for Multiple Data Types for the _id Field
On This Page
Each document in a MongoDB collection includes the
_id field that serves as its primary key. MongoDB supports all data types for this field.
To support this, as of Release 1.61, Hevo has enhanced the queries run on the MongoDB Source such that
_ids of all types are retrieved. However, by itself, this solution can cause documents to get missed or overwritten during ingestion and loading respectively.
During ingestion, the query results for relational operations on the
_idfield would only include documents having an
_idof that type. Consider a collection having documents with
_idof type String as well as Numeric. The documents retrieved for an operation,
_id > 100, after sorting on
_idwould only be ones with a numeric
_id. All the documents with a string
_idwould be excluded.
During loading to the Destination, documents would get overwritten if the value of the
_idfield was same even if the data type was different. Consider two documents, one with
_idas numeric 1 and the other with
_idas a string, “1”. If either one was loaded first, the other would overwrite it.
Therefore, the enhanced queries are supported by the following changes:
To ensure documents are not missed during ingestion, Hevo stores the last read
_idfor all the types in contrast to a single
_idvalue being stored previously.
To pre-empt the loading failures, Hevo uses the
__hevo_idfield to identify the documents to be loaded. The
__hevo_idis a string value generated from the hash of the data type of the
_idfield and its value. For example, suppose the
_idvalue is numeric 1. Then,
__hevo_idis the SHA256 hash of “Integer-1”. The documents can be sorted as per their
_idfield still remains the primary key for the table.
Refer to the following table for the list of key updates made to this page:
|Date||Release||Description of Change|