DynamoDb Incremental Backups – Part Two

The next blog post in this series, we will delve into the details of our DynamoDb incremental backup solution.

If you missed the first post, check it out: Part One

I am not going to delve into DynamoDb too much. If you are reading this blog post, I will be assuming you know about DynamoDb, looking to use it, or are already using it.

DynamoDb Streams

Let’s delve into the DynamoDb Stream. DynamoDb Streams allow you to capture mutations on the data within the table. In other words, capture item changes at the point in time when they occurred.

DynamoDB Streams – High Level

This feature enables a plethora of possibilities such as data analysis, replication, triggers, and backups. It is very simply to enable (as simple as a switch), and it basically enables an ordered list of table events for a 24 hour window.

When you enable the stream, you will have four options:

  • Keys only—only the key attributes of the modified item.
  • New image—the entire item, as it appears after it was modified.
  • Old image—the entire item, as it appeared before it was modified.
  • New and old images—both the new and the old images of the item.

For our use, we will need to enable the New and old images. Lets walk through an example of the sort of data you will see. Use case:

1. INSERT record
2. UPDATE record
3. DELETE record

The DynamoDb Stream will contain these events:

https://github.com/PageUpPeopleOrg/dynamodb-replicator/blob/master/test/fixtures/events/insert-modify-delete.json

As you can see, for an INSERT event, we rely on the NEW image (there is no old image)

For a DELETE event, we rely on the OLD image (there is no new image).

Keep in mind, these events are only guaranteed to be available for 24 hours in the stream. After 24 hours, it can be cleaned out at anytime.

To access the Stream, there is a seperate DynamoDB Streams API available. Under the covers, it is identical to Kinesis Streams. We’re not going to delve into this as it is quite involved, but may revisit in a later blog post. If you are interested, feel free to check out: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html#Streams.Processing.

Lambda

AWS Lambda has been an exciting new service that has promised to change the way cloud perceived, through the evolution of serverless architecture.

AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security. AWS Lambda can automatically run code in response to multiple events, such as modifications to objects in Amazon S3 buckets or table updates in Amazon DynamoDB.

The interesting thing that Lambda enables is the last line. The ability to turn on “distributed database triggers” for your DynamoDb tables.  I shuddered when I realised what this could do, and scared the pain this would unleash on the world… but with great power, comes great responsibility.

Essentially, a “Lambda function”, is code that we can provide Lambda, and this can be triggered based on table updates in DynamoDB.

Tying this back to DynamoDb streams, we can associate our Lambda function, to a DynamoDb Table (which under the covers, simply polls the DynamoDb Stream of the table).

Lambda currently only supports Python, Node.js and Java, with more languages on the horizon.

Code example

When an event is available, it is passed to the Lambda function to execute.

exports.myHandler = function(event, context, callback) {
   ...
   
   // Use callback() and return information to the caller.  
}

In the syntax, note the following:

  • event – AWS Lambda uses this parameter to pass in event data to the handler.
  • context – AWS Lambda uses this parameter to provide your handler the runtime information of the Lambda function that is executing. For more information, see The Context Object (Node.js).
  • callback – You can use the optional callback to return information to the caller, otherwise return value is null. For more information, see Using the Callback Parameter.

    Note

    The callback is supported only in the Node.js runtime v4.3. If you are using the earlier runtime v0.10.42, you need to use the context methods (done, succeed, and fail) to properly terminate the Lambda function. For information about terminating Lambda functions written for earlier runtime versions, seeUsing the Earlier Node.js Runtime v0.10.42.

In the next post, we’ll delve into what we can do with Lambda and S3.

Leave a Reply

Your email address will not be published. Required fields are marked *