Amazon DynamoDB

Amazon DynamoDB is a fully managed, multi-master, a multi-region non-relational database that offers built-in in-memory caching to deliver reliable performance at any scale.

Hevo uses DynamoDB’s data streams to support change data capture (CDC). Data streams are time-ordered sequences of item-level changes in the DynamoDB tables. All data in DynamoDB streams are subject to a 24-hour lifetime and are automatically removed after this time. We suggest that you keep the ingestion frequency accordingly.

To facilitate incremental data loads to a Destination, Hevo needs to keep track of the data that has been read so far from the data stream. Hevo supports two ways of replicating data to manage the ingestion information:

Refer to the table below to know the differences between the two methods.

Kinesis Data Streams DynamoDB Streams
Recommended method. Default method, if DynamoDB user does not have dynamodb:CreateTable permissions.
User permissions needed on the DynamoDB Source:
- Read-only
- dynamodb:CreateTable
User permissions needed on the DynamoDB Source:
- Read-only
Uses the Kinesis Client Library (KCL) to ingest the changed data from the database. Uses the DynamoDB library.
Guarantees real-time data ingestion Data might be ingested with a delay
The Kenesis driver maintains the context. KCL creates an additional table (with prefix hevo_kcl) per table in the Source system, to store the last processed state for a table. Hevo keeps the entire context of data replication as metadata, including positions to indicate the last record ingested.

Prerequisites

  • The DynamoDB user has the dynamodb:CreateTable permission on the DynamoDB database, to ingest data using Kinesis Data Streams. Read about setting permissions here.

    Note: Hevo does not modify any data in the Source tables. The permissions are used solely to store the last processed state for a table by the KCL.

Pre-configuration Requirements

You need to enable Streams on all DynamoDB tables you want to sync through Hevo. Follow the below steps to enable Streams for the tables.

  1. On your AWS console, select DynamoDB, then select Tables in the left pane.

  2. Select a table. Stream details open up with the current details of the table.

  3. Click Manage Stream.

  4. Select New and old images - both the new and the old images of the item and click Enable.

  5. Do this for all the tables you want to sync.

Setup Instructions

Go to your Hevo dashboard and follow the steps given below to add DynamoDB as a Source in a Pipeline.

  1. Select Pipelines in the Asset Palette and click CREATE PIPELINE.

  2. Select DynamoDB as a Source type to continue.

  3. Configure AWS Access Key, AWS Secret Key, and AWS Region using which Hevo will try to connect to DynamoDB. Click Continue to set up the job settings.

  4. You’ll get a list of the tables available to replicate. Note that Hevo will only be able to ingest data from the tables for which DynamoDB Streams is enabled. Deselect the tables you don’t want to replicate. Click Continue to configure the Destination.

  5. Select the Destination where you want to replicate DynamoDB tables or click on ADD DESTINATION to create a new Destination. Read Destinations for more information.

Schema and Type Mapping

Hevo will replicate the exact schema of the tables from the Source DynamoDB to your data warehouse. In rare cases, we skip some columns with unsupported Source data type while transforming and mapping.

The following table shows how your DynamoDB data types get transformed to a warehouse type.

DynamoDB Data Type Warehouse Data Type
String VARCHAR
Binary Bytes
Number Decimal/Long
STRINGSET JSON
NUMBERSET JSON
BINARYSET JSON
Map JSON
List JSON
Boolean Boolean
NULL -
Last updated on 08 Sep 2020