DevOps
DevOps is one of those buzzwords that get thrown around all the time, without anyone seeming to know exactly what it means. I wasn’t quite sure myself, so here’s my impression summed up in one article.
EtymologyPermalink
It’s not that hard to see that DevOps is comprised of two words, Dev and Ops, respectively development and operations. What’s hard is to deduct whatever that means, to combine development and operations, as those have been quite different domains. Let’s start with taking a look at them individually.
DevelopmentPermalink
From a developer point of view, it’s pretty self-explanatory what one does. Developers develop applications and services, and make them as feature rich, pretty and stable as possible. They know the requirements for the applications, such as which other services they depend upon, as well as an approximate resource usage for the machine on which it is hosted. The main focus is still to produce the features required by the company and it’s customers.
OperationsPermalink
Operations, however, are traditionally tasked with maintaining and updating servers, configuring networks, scaling computational capabilities and other infrastructural problems necessary to keep the developed applications running. And, most importantly, installing and updating the applications that development produces. In other words, keeping the services operational.
Dev + Ops = ?Permalink
Now, combining continuously expanding features and keeping the services stable is challenging, as they are two opposing mindsets. On one hand, there is the goal of responding to the competition, changing rapidly, and the other to provide a reliable system, keeping the services available to the users.
Traditionally, developers made applications and shipped them to operations, and did not have any further responsibilities. Operations received an application and deployed it on the servers, and if something didn’t work, they would report back and say it didn’t work. Whether this was because of an operating system update, a bug in the application or a misconfiguration of variables was not easy to figure out.
Both parties did their job, but still ended up in the situation where the service was unavailable or degraded until someone noticed - hopefully not the customer. Both can argue that they didn’t change anything of importance, and both still have to double check their work until the cause is found, and both have to use valuable time on something that possibly could have been prevented.
The technology value stream. The participants are strictly separated, similarly to the waterfall methodology, and nothing comes back upstream.
The conflicting goals of dev and ops is a conflict that will continue as long as the company exists, and the longer they are in conflict, the worse the problem will probably get. However, both of these groups exist in the same company, and should, in theory, have the same aligned goals, ensuring to fulfill both of the goals at the same time. Behold, DevOps.
DevOpsPermalink
The main goal of DevOps is to make deployments safer, mitigating the problems described above. The foundation, however, is to enhance cooperation and align the goals of the participants of the technology value stream, from feature request to user adoption.
The Fluffy PartsPermalink
The book The DevOps Handbook divides the improvement on this topic in two parts: Flow and feedback.
The main concept in flow is the flow of tasks, left to right, in the technology value stream. It consists of dividing projects or features into smaller tasks, much in alignment with the agile methodology. Smaller tasks will normally cause less change, which again usually leads to lower risk deployments. The tasks should also be visible, so that potential bottlenecks can be discovered and improved upon.
The technology value stream, where all participants have the same goals and cooperate accordingly.
The feedback topic deals with the waterfall concept. Feedback enables information flow from right to left in the technology value stream. This information should be used for analysis, reflection and education, to implement countermeasures that prevent the same error from happening again.
Concrete PracticesPermalink
There are many technical parts that enable the fluffy parts, but neither one alone can be called DevOps per se. Some of the practices are:
- Deployment pipeline
- Automated testing
- Low-risk releases
- Telemetry
Each contributes to the quality of the value stream of the company in their respective ways.
The deployment pipeline automates shipment of releases into the different environments. It makes the process effortless, consistent and repeatable. Deployments can use different strategies such as blue-green deployments, A/B-testing or gradual rollout.
Automated testing ensures quality continuously, by checking the implemented tests regularly. They should be run at a per commit or pull request basis. Integration tests involving multiple services should also be run frequently to discover potential incompatibilities.
Low-risk releases involves multiple aspects of software development. Firstly, fewer changes in the release yields lower risk for that particular deployment. Secondly, the software and services should be designed for such incremental changes, using concepts as modularity or feature flags. The releases should be deployed to production shortly after creation, although not strictly coupled.
Telemetry enables insight in the environment and opens the opportunity to learn about all aspects of the services and their interactions. Metrics is what I personally consider the most important factor, and what I would implement or improve first. The information can reveal the impact of a release in any number of terms: Transactions per minute, average number of items in the shopping cart, application memory usage, response code distribution and response times. The metrics should be aligned with the business objectives, and depends on the values of the company. Metrics can also vary between services.
Covering all the DevOps practices is ambitious, but I hope these serve to get the point across of what DevOps can manifest as. What DevOps does not manifest as is an engineer - it would not help the business or value stream to employ a single person to implement a culture. However, many of these practices demands expertise that is not directly in scope of development or operations. The closest thing to a “DevOps Engineer” would be the SRE.
Site Reliability EngineerPermalink
The Site Reliability Engineer (SRE) is mainly responsible for the technical implementations of DevOps. The role requires competence in both development and operations, and enhances collaboration between the silos by being a translator of sorts. SRE’s also have expertise in the tools that combine development and operations. The thousand feet view of the SRE tasks is a software engineering approach to operations and follows up on the status of the different environments.
DevOps can be implemented without an SRE, but some maintenance responsibility can be delegated to the SRE. However DevOps is implemented, it should help the business.
DevOps ImpactPermalink
DevOps can yield pretty wild results, if these sources are to be taken literally. It’s probably an exaggeration of what’s reasonable, but there is no doubt that it will affect the business, or at least the software value stream, positively.
Puppet reports “60X fewer failures and recover from failure 168X faster” and “Deploy 30X more frequently with 200X shorter lead times”. More quotes such as “106x faster lead time” and “7x lower change failure rate” can be cited by New Relic.
Now go be aware of your DevOps implementation and check your business for improvement potential at https://devopschecklist.com.
ResourcesPermalink
- What Is DevOps? - New Relic
- DevOps is a culture, not a role! Medium
- The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis
- The Phoenix Project by Gene Kim, Kevin Behr and George Spafford
- DevOps Roadmap - roadmap.sh
- Site Reliability Engineering - sre.google
- The DevOps Checklist - devopschecklist.com