The Five Whys

I’ve recently finished re-reading Lean Start Up and a chapter that has been great to refresh is the Five Whys.

It sounds fairly straight forward – a technique which allows you to perform a root cause analysis. Ask “Why did that happen?” five times, to get to the root cause of the problem.

Asking the five whys sounds simple, but in practice, it is very difficult to find that root problem consistently.

Having been through the processes a few times, I’ve noticed:

  • It is very easy to go off on tangents
  • Not have all the information at hand.
  • People tend to focus on coming up with actions as soon as possible, in an attempt to “solve a problem”, and worse, appease Management / Executives.

In most scenarios, I’ve noticed they are actions to solve symptoms and the problem shows up in other forms down the track. Some key reminders for I’ve use during such a process:

1. A very important part in performing the five whys is to perform the session with everyone involved in the room. No proxies, everyone involved with the issue, client facing, technical staff, everyone. This is critical. Perception and roles missing from the analysis can miss the root problem, or fall short.

2. Ensure someone is identified as running the meeting. They are in charge of moving past the noise, avoid going off track and drilling down.

3. The outcome is not to identify actions, but to identify the root cause.  Managers / Executives should ensure this when actions are presented after the analysis, do the actions address the root cause?

4. Actions can make it feel like you are solving the problem, so typically I prefer to address actions at the end of the session.

A great example outlined in the book was:

  1. A new release broke a key feature for customers. Why? Because a particular server failed.
  2. Why did the server fail? Because an obscure subsystem was used in the wrong way.
  3. Why was it used in the wrong way? The engineer who used it didn’t know how to use it properly.
  4. Why didn’t he know? Because he was never trained.
  5. Why wasn’t he trained? Because his manager doesn’t believe in training new engineers, because they are “too busy.”

Nicely illustrates the “human problem” behind every technical problem.

Outputs v Outcomes

We know software development industry is focused on solving problems. Customer problems. User problems.

Better, faster, cheaper.

There is a great article in the Harvard Business Review on Outputs v Outcomes:

Outputs are features.

Outcomes is true value delivered.

We aim for our outputs to deliver outcomes, its treated as an abstraction. The problem in most cases is, development teams treat outputs as the end goal, and build release plans, and sprints on getting that “feature” out.

We don’t talk enough about the problem we’re solving, the outcome we’ve aiming to achieve, and even more so, measure whether we achieved the outcome.

I am not a fan of measures such as “Increased usage”. Thats not an outcome.

An great example I’ve seen in building Recruitment software is Time to fill (TTF). TTF is the time it takes to fill a vacant seat from when the request is made. TTF directly translates to Cost to fill, which is a measurable impact to the company.

For a recruitment team, Time to Fill is THE KPI they are measured on.

So rather than building a feature or product we “think” will improve TTF, measure it, try different ideas, and keep trying until you find something that gets your outcome.

None of this is groundbreaking, all we’re doing is ensuring we connect our problem solvers (product teams / developers) to our users. It is fundamentally Agile.

Contextual Craftsmanship

For many years, I’ve spoken to friends, colleagues, mentors/mentees about craftsmanship and how important it is in Software Development.

As a software developer, it is very easy to agree with. Quality is not negotiable, write your tests first, blah blah blah. We understand how important design patterns, readable, maintainable code is, especially in building sustainable software.

One aspect that I find isn’t spoken about explicitly enough, is Contextual Craftsmanship. We all know there is no silver bullet, and hence I find that craftsmanship is very similiar. There is no correct way to develop, and it depends on what you are doing, and the business context you are in.

A start up context is very different to an enterprise. Code built in a hack-a-thon is very different to a system in which a bug can literally result in life or death (ie Software for planes or medical equipment).

I am lucky enough to have experienced many of those contexts – and I’ve assumed developers are aware of this; people will change their approaches.

But as I mentor developers, and work with large development teams, I recognise the line is not always clear for everyone. The need to discuss the context explicitly is very important.

Is this a Spike / Proof of Concept that will be thrown away?

Is this an MVP to gain insights and validated learnings?

Are you building a business critical service for a Fortune 500 company which requires 99.95% uptime?

Craftmanship depends on the context.

Have the team agree on the approach; the level of quality, reliability, maintainability, DR, etc.

It is fundamental for the team to be on the same page, but it is just as important to ensure your stakeholders align to this.



DynamoDb Incremental Backups – Part Four

Before we start: If you have missed the previous three posts, please check them out here:

Part One
Part Two
Part Three

At this stage, I’m going to assume you are comfortable with DynamoDb Incremental Backups, and the format they are stored in.

In this post, we will walk through the restore step, and I’ll be first to admit, this can be taken much further than I have. I haven’t had the time, or need to take this as far as I would have liked, but don’t let this stop you! I would love to hear from you if you have done something interesting with this, ie –  automating your DR / backup restore testing.

For our DynamoDb Incremental backups solution, we have incremental backups stored in S3. The data format is in the native DynamoDb format, which is very handy. It allows us to run it out to DynamoDb with no transformation.

Each key (or file) stored in an S3 versioned bucket is a snapshot of a row at a point in time. This allows us to be selective in what we restore. It also provides a human-readable audit log!

S3 has API available which will allow us to scan the list of backups are available:

Get Bucket Object Versions

Leveraging this, we can build a list of data that we would like to restore. This could range from a row from a point in time, or entire table at a point in time.

There are a range of tools which allow us to restore directly these backups in S3:

  1. Dynamo Incremental Restore
    The first option allows you to specify a point in time for a given prefix (folder location) in S3. The workflow is:

    1. It scans all the data available using the Version List in S3
    2. Build a list of data that is required to update.
    3. Download the file(s) required from #2, and push it to DynamoDb
  2. Dynamo Migrator
  3. DynamoDb Replicator
    A snapshot script that scans an S3 folder where incremental backups have been made, and writes the aggregate to a file on S3, providing a snapshot of the backup’s state.

We haven’t had any issues with our incremental backups, but the the next steps would be to automate the DR restore at a regular interval to ensure it provides the protection you are looking for.

DynamoDb Incremental Backups – Part Two

The next blog post in this series, we will delve into the details of our DynamoDb incremental backup solution.

If you missed the first post, check it out: Part One

I am not going to delve into DynamoDb too much. If you are reading this blog post, I will be assuming you know about DynamoDb, looking to use it, or are already using it.

DynamoDb Streams

Let’s delve into the DynamoDb Stream. DynamoDb Streams allow you to capture mutations on the data within the table. In other words, capture item changes at the point in time when they occurred.

DynamoDB Streams – High Level

This feature enables a plethora of possibilities such as data analysis, replication, triggers, and backups. It is very simply to enable (as simple as a switch), and it basically enables an ordered list of table events for a 24 hour window.

Continue reading DynamoDb Incremental Backups – Part Two

DynamoDb Incremental Backups – Part One

DynamoDb is an AWS fully managed NoSQL service, which provides a fast and predictable data store. We’ve been using it for several microservices in the past 18 months, and one feature that is sorely missed, are incremental backups.

AWS provides an option to take snapshots of your table using a service called DataPipeline. At a high level, what this does is:
1. Create an EMR (Elastic Map-Reduce) cluster
2. Perform a parallel full scan of the table in question (while consuming read units) into JSON data
3. This JSON data can be uploaded to S3 or similiar

DynamoDb to S3 Template in Data Pipeline Architect

The issue I have with this is, that the backup is not an “point in time” snapshot, it is essentially scanning the table (which can take hours) while the table is essentially still live.

Our requirements for DPO (Data Point Objective) is 30 minutes. Which basically means, if “shit hits the fan”, we can only have 30 minutes of data loss (in the worst case). This is our contractual agreement with our clients.

Given this, we have been investigating ways to solve this problem, which has led us to creating incremental backups for DynamoDb, stored in an S3 versioned bucket.

DynamoDb Incremental Backups to S3
DynamoDb Incremental Backups to S3


In the next post, I’ll walk through the details of our implementation, and provide the source code of the Lambda Function.

Pragmatism and Business Acumen

The other week, I was having lunch with our CTO at PageUp Tal Rotbart, and we were discussing various issues in the industry, where he posed a question to me that got me thinking  – “Isn’t pragmatism just business acumen?”

I’ve been pondering the question for some time now… Let me first start with defining the two.

Pragmatic: dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations.

Business acumen: is keenness and quickness in understanding and dealing with a business situation in a manner that is likely to lead to a good outcome.

Given those terms, both seem to speak to similar traits in terms of software development, but the problem is, both can be very relative. Also, business acumen seems to be higher level concept, which can encapsulate pragmatism.

Pragmatism without business acumen can be just as deadly to a company as not having pragmatic approaches to start with.

Which drove me to start thinking about seniority levels within a development team.

Continue reading Pragmatism and Business Acumen

Microservice Scars

I have the pleasure to be presenting at the next Alt.Net meet up (in Melbourne) with Joshua Toth. We will be discussing the lessons we have learnt from our first Microservice at PageUp.

It has been in production for over 9 months (158 no downtime deployments), and it is worth sharing our experiences, and thoughts.

If you have any areas you would like us to discuss, feel free to drop me a line, tweet or just leave a comment below.

%d bloggers like this: