Introduction
Hello, fellow developers! 🚀
To introduce the idea of the Walking Skeleton practice, let me describe an example: a new project starts, we work on the features for months and it all works locally, and it’s complex enough to include a database, an admin dashboard, a queue system for async processing and a Redis instance for cache; then, we need to build the architecture to deploy on a test environment: here we will have to understand how to set all that stuff up on a test environment, in a cost-effective way, and after some tests, we will probably need to fix something on both code and architecture; finally, we will need to move to production: typically, our choices here will change because “it’s production“ and we will have to rediscover and fix some issues, probably old and new ones, all together. We were basically done with development, but architecture and release for both test and production took weeks, mostly because we had to face all those issues together.
If this situation is familiar in some way, you will easily understand what benefits can bring the practice of walking skeleton: let's get started!
Start small, also in the architecture
Alistair Cockburn defined the Walking Skeleton practice as:
a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.
A similar concept called “Tracer Bullets” was introduced in The Pragmatic Programmer.
In simple words, when you start a project with a walking skeleton, you first build a minimal functionality (as simple as a “Hello world” page or API) and then release it to the test and production environments.
By doing this, you start building all the pieces that together will create your system, each in its own easier and smaller version possible; for example, every automation and tool you need should be in place, including:
-
the automated pipeline to build the app, execute tests, trigger the deployment, etc
-
the logging, monitoring, and alerting systems to allow for instant alerts of errors and easy investigation
And, of course, you will also need the servers to deploy the code, whether it is in the cloud or not: you will have the simplest version of your system and architecture, while the features are still zero.
This practice aims to broaden the baby steps and YAGNI principles to include the entire system, and not only the codebase: by doing this, we are reducing the chances of issues and mistakes happening by splitting the complexity of building our architecture through time instead of facing it all together just before releasing, and we will also be sure that all we add to our system is because we need it, when we need it.
Moving on with the example, we will build only what is required, which is the minimum stuff we should all have in production:
-
the automated pipeline
-
a log system
-
a monitoring system
-
and alerting system
-
a server for test env
-
a server for production env
Since we want to follow YAGNI and baby steps, we also don’t want to build the “ultimate” version of any of those: for example, the pipeline will be minimal (build, execute tests, deploy) and we will postpone decisions such as which static analysis tool to use; the alerting system might start as an email and evolve to Slack or any different channel in the future; the monitoring system might only show minimal data required to check if there are issues; the servers will likely have very little memory and CPU power, and so on.
Here is a sample list of what we will not have at first, even if we will likely need them in the future:
-
a database, or any other persistence layer
-
a way to handle environment variables
-
a way to handle secrets
-
a Redis cache instance
After being done with the Walking skeleton, we will start working on the first feature - and as soon as the need for one of these pieces shows up, we will add it to our infrastructure.
An underestimated practice
Every time we want to learn a practice, we should strive to understand why that practice is important from a technical and business perspective, and the walking skeleton makes no exception.
As already mentioned, the two main principles behind it are:
-
Baby steps: In the same way as working with short branches it’s easier than using long-living feature branches or using TDD to get feedback every minute on our code makes it super fast to fix any mistakes we make, adding pieces to the system one by one will drastically reduce the complexity compared to putting all pieces together at once/
-
YAGNI: “You Aren’t Going to Need It” - a very well-known principle in software development, typically used to refer to the idea of not adding pieces of code to generalize or expand a feature just because “you never know, we might need it in the future”; instead, stick to what you need today, and add new pieces when the need appears - here, we want to apply the same principle to the entire system: for example, why adding a database while you don’t have any table yet? When the need to persist some data comes, for example, you might even discover you need a no-SQL solution for that specific use case. It’s not only to wait to split complexity along the way but also to ensure that you decide in the last moment possible to ensure it’s the best decision possible.
In addition to these principles, the Walking skeleton shares a target with every other agile practice: reducing risks. According to Hofstadter’s Law:
It always takes longer than you expect, even when you take into account Hofstadter’s Law.
Making changes to architecture becomes more difficult and expensive over time, as it ages and grows in size. As in many other agile practices, we want to identify errors as soon as possible. This technique provides us with a short feedback loop, allowing us to adjust and work iteratively as needed to fulfill the business' needs. Assumptions about the architecture are validated very early and any error is discovered early in the implementation process, causing an easier evolution of it.
In the end, the Walking Skeleton is a practice to validate the architecture and get early feedback so that it can be improved. It’s fundamental that you don’t write the first acceptance test until the walking skeleton is deployed to production, possibly behind a feature flag or just hidden from the outside world if required, because you want to exercise your deployment and build scripts from the beginning to discover as many potential problems as you can as early as possible.
I know what you are thinking: but Dan, this means we have a test and production environment since day 1 that we have to pay for!
Short answer: yes, and you will not regret it.
Long answer: the additional cost of having a test and production environment from scratch should be very little, considering how simple the system will be at the beginning; in most cases, there will probably be a free or very cheap option in any hosting service you are using. And this time should not be long anyway, because you are releasing the first version, which should be an MVP. You don’t expect to work 6 months before reaching the first release, right? 😉
Let me know if you already use this practice and what you think about it!
Until next time, happy coding! 🤓👩💻👨💻
Dan’s take 🙋🏻♂️
The reason why Agile principles, methodologies, and practices changed my career is because they brought answers to questions that tormented me since my day 1 as Software Developer:
-
How is it possible that there is no way to ensure I don’t break my previous code? Automated tests
-
How is it possible that there is no way to make coding easier and safer? TDD
-
How is it possible that there is no way to notice errors in production? Monitoring & Alerting
Walking Skeleton practice had the same impact: how is it possible that there is no way to avoid releasing everything at once in the end? The walking skeleton (and evolutionary architecture principle) is the answer to this question at an architecture/entire system level, the same way as Continuous Integration/Trunk-Based Development is the answer at the code level.
The first time I worked to a project where the Senior leading the team brought this technique up, it was amazing to see the impact. Setting up pipelines, monitoring, alerting, logging, and everything else all at once when the features are done and ready for tests - and then repeating everything all at once to setup production just before the release - has always been a nightmare experience.
The very long TODO list in the backlog united to the stress of the release coming closer is a devil mixup that makes issues very very likely. By creating a Walking Skeleton, we create the simplest version possible of each of those elements, and then evolve them together with code and requirements: in the same way that CI avoids big merge conflicts and time-consuming async Pull Requests on big changes, the Walking Skeleton will avoid accumulating all the work related to the setup the system and architecture, splitting it along the way, making it easier to handle the (small) changes.
Once you start working in baby steps, iterative way you find it useful in any circumstance, even in your personal life, so being able to approach the architecture and system the same way I approach coding and development is great!
Typically, I consider the Walking skeleton of a new service once I have:
-
Local environment setup running under Docker + Docker Compose, including both features and tests (I most often build APIs, so I typically start with a GET /hello-world API and its test)
-
Setup of the logging system (depending on the language and framework) to print logs in standard format (typically JSON) via standard output
-
Makefile to create the catalog of repetitive useful commands to be used locally during development
-
Readme file with basic stuff like info about the project, prerequisites, and how to set up and run locally
-
Pipelines (typically via GitHub Actions) to build and test the service and then release it on test at every push on the master branch, and on production at every tag
Test and production environments are born identical (and should always remain such):
-
A server somewhere to release the Docker image of the application
-
A login system that gets the standard output and allows to easily search and read logs
-
A monitoring system that notices any error happening (basic example: 500 HTTP responses of APIs)
-
An alerting system that emails the team when an error happens (at first, it can even email me at the first error immediately)
One last tip from my experience: when I am in a micro-service context and my team creates new services very often, I typically suggest creating a template repository with the walking skeleton, so that when you need a new one you can start from there and speed up the code part of the skeleton, focusing only on the architecture one - if you add infrastructure as a code to this example, you can basically automate a good part of walking skeleton.
A Walking Skeleton is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.
[Alistair Cockburn]
Go Deeper 🔎
📚 Books
-
Growing Object-Oriented Software, Guided by Tests - This book talks about a lot of great practices, and chapter 10 is dedicated to Walking Skeleton iself.
-
The Pragmatic Programmer - In this book, the Walking Skeleton practice is referenced as “Tracer Bullets”, and the principles are mostly the same; in addition, the book is filled of useful tips for being a good and professional developer
📩 Newsletter issues
-
How to start a new project ? What is walking skeleton ? [Tony’s Substack by Tonydev]
-
Creating the walking skeleton - Part 1 and Part 2 [Sivu Writes Software by Sivu Makhonco]
📄 Blog posts
🎙️ Podcast episodes
-
Iteration 0: Walking Skeleton [The Competitive Advantage podcast]
-
Creating a Walking Skeleton [InfoQ podcast**]**