February 21, 2019
As we previously discussed in Aimar's post, DevOps aims to improve quality and efficiency, while reducing the risk of failure within software development teams. There are a number of ways we may achieve this by. These include breaking down team-silos, automating processes and continuously quantifying success. To view all this from a slightly more practical angle, this time we'll focus on 'three ways DevOps efforts help save time'.
1. Enabling the Developer
A well-known challenge for software engineers is to continuously sharpen their skills while also delivering the most reliable code possible. Developers — like everyone in agile teams — are held back by time. This means that missing deadlines when delivering features directly impacts customer satisfaction and sales figures. In addition, the speed at which our industry is evolving does not allow them to just get by for too long relying solely on comfortable, well-known technologies. Keeping up with new trends is essential.
Fear of Experimentation
Engineers often can't afford to invest their time trying out new tools, because of the constant risk of missing deadlines. Also, some are unfortunately bound by a work culture where delays like this are a punishable offence (see Fear Driven Development). It is easy to see how swiftly adapting development resources are valuable to any team.
A for Automation (CALMS Framework for DevOps)
The above situation is one, where automation can really shine. If our software environment is able to set-up automatically, we can quickly "spin up" new instances of it (and all associated resources) identical to the one used by our end-users. This includes anything from containerised systems orchestration, virtual machines configured as servers to databases — the whole stack. Developing against a production-like environment is crucial for development teams. As a result, every aspect of live deployed software is better understood, and the amount of assumptions is reduced.
2. Shifting Left? What's right?
The concept of Shift-Left is slowly overtaking traditional software development lifecycle models. The idea is based on the hard-learned fact that it's extremely costly to recover from software failures if we discover them "too late". Instead of finding them right before or after delivery, Shift-Left encourages the majority of testing to start happening much earlier. Why "left", you ask?
■ Shift-Left model ■ Traditional lifecycle model
Consider the following scenario: a team is about to release a feature that a major client is eagerly awaiting. All the code that the developers have worked on has been merged and tested by quality assurance engineers. The code has been pushed all the way through this thing called "Continuous Integration / Continuous Delivery Pipeline" — developer-speak for a series of automated tests, scans and the actual code-compilation carried out by a set of well-mannered computers. These are the steps a team finds necessary to guarantee the application's quality. So, let’s say the team is about 95% sure that the code is good for delivery. Then, release-to-important-customer minus one day: a bug is found. A particular edge case that hadn't been considered at the design stages. It won't affect everyone, just enough so that you are a bit ashamed of your upcoming release.
Distressed product owner
Take the long road
It's likely that nobody is going to take the responsibility (or should be allowed) to fix this bug and deploy the application directly to the production environment. Avoiding the hordes of automated integration, user, performance and security testing facilities is a
bad idea risk nobody will be asking for. You're going to have to take the long road and have your fixed code go through the same process it did the first time.
This generally means correcting the problem and having a fellow engineer review your work. The new, hopefully bug-free, version is then ready to travel all the way through the above mentioned CI/CD pipeline. This, depending on the maturity of the process and the level of automation in place, can easily take days. Being the standard guarantee of quality for the application, there really isn't any room for bargaining here. You can't realistically expect to be able to safely skip any part of this process.
If only we'd caught it when it was only going to waste the entire day of a single person!
The above problem is one we can much more easily mitigate (and without the painful stress and
blame-game forced overtime) if we allow testing to begin at a considerably earlier stage. If we behaved like there was a live deployment the following day, and ran the whole suite of our tests each time a developer was about to merge their work with the rest of the application code, we'd risk losing significantly less time.
Shifting-Left, therefore, really doesn’t have anything to do with the quantity of tests, it's rather about when that testing takes place.
3. Learning to Handle Failure
Catastrophes happen. There's no denying it — and while there's obviously a great deal you can do to prevent them, you'll never be completely sure that you're 100% covered. Learning how to deal with them is the only way you can be completely sure that you can effectively minimise the impact on your business.
It's probably starting to sound like a broken record by now, but yet again, automation is what's key here.
Picture a situation of crisis and how to best handle it. You probably imagine some sort of a well-designed action plan that people can follow. Carefully planning everything out in order to avoid relying on individual's judgement at the moment.
"Studies show that the average cost of a production outage is roughly $5,000 per minute, while the average cost of a critical application failure per hour is $500,000 to $1 million.”
But what if, instead of just writing up a good plan, we automated the complete recovery from a full system-failure?
The idea of Server Immutability is similar to allowing your application to reach customers exclusively through the same delivery pipeline. Predictability and reproducibility boost stability. The idea is that the only way you change a system component (e.g. a server) is by recreating it with your adjustments “baked in”. This means no manual fiddling around with a deployed server because that is exactly how you introduce mistakes, and what we call Configuration Drift.
Phoenix > Snowflake
(That’s right: the deeper we go, the more poetic the metaphors become.)
When a server goes down and needs to be modified in any way, we don't do it on the server itself. Instead, we ruthlessly kill it and change the blueprints we used to create it. So it dies and then it's born anew -- much like a phoenix! While the blueprints are stored in a shared repository where everyone can access and study them.
In contrast to the above, a "Snowflake server" is one that has been altered in some way since its creation. People, who may or may not still work for your company, for whatever reason needed to change its settings. This results in an unverifiable, completely unique configuration that nobody really understands.
We discussed a few of the many scenarios in which investing in DevOps across teams helps save a huge amount of time. The particular focus of this article was the added agility and efficiency brought on by automated processes. The time and cost benefits of having computers predictably perform tasks instead of humans, are truly remarkable. These principles, although mostly associated with DevOps effort, should certainly be observed by the rest of the people involved with software delivery as well. Our goals are the same after all!