Amazon DynamoDB

Amazon DynamoDB is a fully managed, multi-master, a multi-region non-relational database that offers built-in in-memory caching to deliver reliable performance at any scale.

Hevo uses DynamoDB’s data streams to support change data capture (CDC). Data streams are time-ordered sequences of item-level changes in the DynamoDB tables. All data in DynamoDB streams are subject to a 24-hour lifetime and are automatically removed after this time. We suggest that you keep the ingestion frequency accordingly.

To facilitate incremental data loads to a Destination, Hevo needs to keep track of the data that has been read so far from the data stream. Hevo supports two ways of replicating data to manage the ingestion information:

Refer to the table below to know the differences between the two methods.

Kinesis Data Streams DynamoDB Streams
Recommended method. Default method, if DynamoDB user does not have dynamodb:CreateTable permissions.
User permissions needed on the DynamoDB Source: - Read-only - dynamodb:CreateTable User permissions needed on the DynamoDB Source: - Read-only
Uses the Kinesis Client Library (KCL) to ingest the changed data from the database. Uses the DynamoDB library.
Guarantees real-time data ingestion Data might be ingested with a delay
The Kenesis driver maintains the context. KCL creates an additional table (with prefix hevo_kcl) per table in the Source system, to store the last processed state for a table. Hevo keeps the entire context of data replication as metadata, including positions to indicate the last record ingested.

Schema and Type Mapping

Hevo replicates the schema of the tables from the Source DynamoDB as-is to your Destination database or data warehouse. In rare cases, we skip some columns with an unsupported Source data type while transforming and mapping.

The following table shows how your DynamoDB data types get transformed to a warehouse type.

DynamoDB Data Type Warehouse Data Type
String VARCHAR
Binary Bytes
Number Decimal/Long
STRINGSET JSON
NUMBERSET JSON
BINARYSET JSON
Map JSON
List JSON
Boolean Boolean
NULL -

Limitations

  • On every Pipeline run, Hevo ingests the entire data present in the Dynamo DB data streams that you are using for ingestion. Data streams can store data for a maximum of 24 hrs. So, depending on your Pipeline frequency, some events might get re-ingested in the Pipeline and consume your Events quota. Read Pipeline Frequency to know how it affects your Events quota consumption.

See Also


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Nov-08-2022 NA Added section, Limitations.
Oct-17-2022 1.99 Updated section, Configure Amazon DynamoDB Connection Settings to add information about deferment of data ingestion if required permissions are not granted.
Oct-04-2021 1.73 - Updated the section, Prerequisites to inform users about setting the value of the StreamViewType parameter to NEW_AND_OLD_IMAGES.
- Updated the section, Enable Streams to reflect the latest changes in the DynamoDB console.
Aug-8-2021 NA Added a note in the Source Considerations section about Hevo deferring data ingestion in Pipelines created with this Source.
Jul-12-2021 1.67 Added the field Include New Tables in the Pipeline under Source configuration settings.
Feb-22-2021 1.57 Added sections:
- Create the AWS Access Key and the AWS Secret Key
- Retrieve the AWS Region
Last updated on 15 Nov 2022

Tell us what went wrong