Amazon DynamoDB

Amazon DynamoDB is a fully managed, multi-master, a multi-region non-relational database that offers built-in in-memory caching to deliver reliable performance at any scale.

Hevo uses DynamoDB’s data streams to support change data capture (CDC). Data streams are time-ordered sequences of item-level changes in the DynamoDB tables. All data in DynamoDB streams are subject to a 24-hour lifetime and are automatically removed after this time. We suggest that you keep the ingestion frequency accordingly.

To facilitate incremental data loads to a Destination, Hevo needs to keep track of the data that has been read so far from the data stream. Hevo supports two ways of replicating data to manage the ingestion information:

Refer to the table below to know the differences between the two methods.

Kinesis Data Streams DynamoDB Streams
Recommended method. Default method, if DynamoDB user does not have dynamodb:CreateTable permissions.
User permissions needed on the DynamoDB Source:
- Read-only
- dynamodb:CreateTable
User permissions needed on the DynamoDB Source:
- Read-only
Uses the Kinesis Client Library (KCL) to ingest the changed data from the database. Uses the DynamoDB library.
Guarantees real-time data ingestion Data might be ingested with a delay
The Kenesis driver maintains the context. KCL creates an additional table (with prefix hevo_kcl) per table in the Source system, to store the last processed state for a table. Hevo keeps the entire context of data replication as metadata, including positions to indicate the last record ingested.

Prerequisites

  • An active AWS account is available.
  • Streams are enabled on the DynamoDB tables to be replicated.
  • An AWS IAM Policy is created with the required permissions for the DynamoDB user to ingest data from the DynamoDB database (if using Amazon Kinesis Data Streams).

    Note: Hevo does not modify any data in the Source tables. The permissions are used solely to store the last processed state for a table by the KCL.

Configuring Amazon DynamoDB as a Source in Hevo

Go to your Hevo dashboard and follow the steps given below to add DynamoDB as a Source in a Pipeline.

  1. Select Pipelines in the Asset Palette and click CREATE PIPELINE.

  2. Select DynamoDB as a Source type to continue.

  3. Configure AWS Access Key, AWS Secret Key, and AWS Region using which Hevo will try to connect to DynamoDB. Click Continue to set up the job settings.

  4. You’ll get a list of the tables available to replicate. Note that Hevo will only be able to ingest data from the tables for which DynamoDB Streams is enabled. Deselect the tables you don’t want to replicate. Click Continue to configure the Destination.

  5. Select the Destination where you want to replicate DynamoDB tables or click on ADD DESTINATION to create a new Destination. Read Destinations for more information.

Enabling Streams

You need to enable Streams on all DynamoDB tables you want to sync through Hevo. To do this:

  1. Log in to the Amazon IAM Console.

  2. Select DynamoDB, then select Tables in the left pane.

  3. Select a table. Stream details open up with the current details of the table.

  4. Click Manage Stream.

  5. Select New and old images - both the new and the old images of the item and click Enable.

  6. Do this for all the tables you want to sync.

Creating an IAM Policy

Note: An IAM policy is needed for KCL(Kinesis Data Streams) only.

The policy is an object in AWS, which, when associated with an identity or resource, defines their permissions. Therefore, when Hevo makes a request to access the data in your DynamoDB account, the policy is applied to the related API. AWS evaluates the permissions in the policy to determine whether the request is allowed or denied. Most policies are stored in AWS as JSON documents.

Perform the following steps to create the IAM policy:

  1. Log in to the Amazon IAM Console.

  2. Click Policies in the left navigation bar.

    Policies link

  3. Click Create policy in the right pane.

    Create Policy link

  4. Click the JSON tab and paste the following policy into the editor. The JSON statements list the permissions the policy would assign to Hevo.

       {
         "Version": "2012-10-17",
         "Statement": [
           {
             "Effect": "Allow",
             "Action": [
               "dynamodb:DescribeStream",
               "dynamodb:DescribeTable",
               "dynamodb:GetItem",
               "dynamodb:GetRecords",
               "dynamodb:GetShardIterator",
               "dynamodb:ListStreams",
               "dynamodb:ListTables",
               "dynamodb:Scan",
               "dynamodb:CreateTable",
               "dynamodb:PutItem",
               "dynamodb:GetItem",
               "dynamodb:UpdateItem",
               "dynamodb:DeleteItem"
             ],
             "Resource": [
               "*"
             ]
           }
         ]
         }
    

    Note: Hevo does not modify any data in the Source tables. The permissions are used solely to store the last processed state for a table by the KCL.

  5. Click Review policy.

  6. In the Review policy page, provide a Name for the policy. For example, Hevo-access.

  7. (Optional) Provide a Description.

    Name for the policy

  8. Click Create policy. You can view the new policy in the list.

    Create the policy

Schema and Type Mapping

Hevo replicates the schema of the tables from the Source DynamoDB as-is to your Destination database or data warehouse. In rare cases, we skip some columns with an unsupported Source data type while transforming and mapping.

The following table shows how your DynamoDB data types get transformed to a warehouse type.

DynamoDB Data Type Warehouse Data Type
String VARCHAR
Binary Bytes
Number Decimal/Long
STRINGSET JSON
NUMBERSET JSON
BINARYSET JSON
Map JSON
List JSON
Boolean Boolean
NULL -

See Also

Last updated on 28 Oct 2020