DevOps Transformation using Theory of Constraints

Contact Us
DevOps Transformation using Theory of Constraints

For several years IT companies have been exposed to more and more blogs and conferences talking about DevOps, arguably the hottest topic since Agile. Those who follow the DevOps movement for some time might have also noticed that many presentations and blog posts repeatedly mention the Theory of Constraints by Dr. Eliyahu M. Goldratt. In fact “The Phoenix Project,” a book which many consider a must-read for any DevOps manager, was declared by its author Gene Kim that it was written using the exact same structure as Dr. Goldratt’s best seller novel “The Goal.”

The DevOps movement is full of ideas on how companies should improve, these ideas are created standing on the shoulders of several giant thinkers and doers who in turn transformed whole industries with their own ideas. One such giant is Dr. Goldratt, others include Dr. Edward W. DemingTaiichi Ohno, and even Henry Ford, all of which have laid major contributions to management in manufacturing and multiple other industries.

Every time I attend a conference or watch a video where a speaker mentions ToC or Dr. Goldratt, it makes me really pleased to see that we are learning the lessons of the past. At the same time, I wonder: “How many people in the audience have read any Dr. Goldratt or Dr. Deming books?”. There is probably no survey that can definitively answer this question. I believe that either Theory of Constraints coupled with DevOps talks is a vast echo chamber, or much more likely simply a large portion of the audience has not actually read any of the books or discovered the knowledge to understand this connection of DevOps and ToC. Those who do know ToC and Lean can be clearly distinguished, in many cases, these people don’t shut up about it (in a good way).

While watching yet another video of such a presentation from the DOES16 conference, called “Beyond the Phoenix Project” — I decided to re-read and re-listen to the ToC ideas and provide my own narration to glue ToC and DevOps tighter together. I would like to quote Dr. Goldratt and light a wider spotlight on the IT industry using my interpretation of what he might have thought:

Disclaimer: I have re-read the books and watched the videos and listened to recordings more than once in the last decade. I try and keep the explanations as close as possible to what these great men and women are explaining, but any and all mistakes in understanding and interpretation are mine and mine alone.

Let us start with a quote:

“Technology can bring benefits if and only if it diminishes a limitation.”
— Dr. Eliyahu M. Goldratt

Dr. Goldratt begins with this quote in his “Necessary but not Sufficient” book in his discussion on the role of technology. It is also the first topic of his “Beyond the Goal” audiobook and explained as the basis for all the logic derived to prove his point. He has repeated this basic assumption many times in other lectures and presentations.

According to Dr. Goldratt to bring about a technology based change that will have an effect — one must answer four questions. All four must have answers, or else the assumed benefit from implementing this technology cannot be gained.

  1. What is the value of this new technology?
  2. What limitation does this technology diminish?
  3. What were the policies and rules that governed before trying to adopt this technology?
  4. What must the new policies be when this new technology is used?

Let us take “DevOps” as an example of a “technology” in the broad sense of the word, and try to answer these four questions.

Q 1: What is the benefit DevOps brings?

The buzzword “DevOps” has become so overloaded that rarely two different people can agree on what it means anymore. Therefore the answer is not apparent. First, I will try and define the term DevOps using the same notions employed by those who coined the word, and not rely on the interpretation of those who guess its meaning after the fact.

“[DevOps] … has a tremendous impact on the business. The technical team starts pulling together as one. An ‘all hands on deck’ mentality emerges, with all technical people feeling empowered and capable of helping in all areas.”
“… the battle of developers vs. sysadmins begins to transform into a cross-disciplinary approach to universally maximize reliability.”
“… has a positive effect on the bottom line — better reliability and availability, happier clients, faster time to market, more time to focus the team energy on core business.”
— Stephen Nelson-Smith http://www.jedi.be/blog/2010/02/12/what-is-this-devops-thing-anyway/

According to Stephen, the main benefit that DevOps brings is reducing friction between engineers in an organization which creates more reliable systems that are easier to use, maintain and improve over time, as well as happier clients which all translates into real bottom line results.

Notice that there is no practical explanation on how this is achieved in practice, reading articles about DevOps we rarely know what we should do the next day at work. The description of the effects is quite verbose, leaving the reader to guess how to get them.

Q 2: What limitation is diminished by DevOps?

According to Dr. Goldratt, in some cases, a limitation might be recognized and apparent. And in other situations completely unrecognized, we just accept a given reality, the way things are and do not suspect that it should change or that it is possible to change it. To cope with existing limitations, our people invent rules and policies that allow us to continue achieving success in spite of the limitation existence.

The same blog article about DevOps clearly explains a given reality currently prevalent in many IT organizations.

“In the software IT industry, there’s a tacit assumption that projects will run late, and when they’re delivered (if they’re ever delivered), they will underperform, and not deliver well against investment.”
“Once an application is in production, the business tends to gain a tremendous fear of change. A constant suspicion that the software and the platform upon which it sits is somewhat brittle and vulnerable. Bureaucratic change-management systems are put in place, making it painfully long to introduce new features or fix problems with the application.”
— Stephen Nelson-Smith http://www.jedi.be/blog/2010/02/12/what-is-this-devops-thing-anyway/

The quote above points to a cause of new policies for a company that is trying to deliver a much better experience, service, and products to its customers — fear of change. Although the exact root cause for this is not explained — it does exist if one is unafraid to remove several hidden layers of logical reasoning, we find a conflict underlying the reasoning. We can either fear change and deploy infrequently, or deliver products even faster to customers by changing more frequently — both can’t exist at the same time.

Engineers in organizations are quickly learning from experience, as are their managers. Learning from experience also creates a culture where intuition helps guide towards solutions that will have the best result. The engineers and managers are smart people and are actually achieving amazing feats in spite of the limitation existence, why? Because creating IT systems is a complicated endeavor, without sufficient data and information engineers and managers still face the need to make decisions — they might not know the exact best solution, but they must decide, this is where intuition is born.

The intuition which makes IT achieve its goals states that IT systems are brittle, not very reliable, and trying to create cross-organizational communication and collaboration is pretty much fruitless. And yet there are dozens of extremely talented engineers who are getting by and achieving their goals, creating significant bottom-line results for businesses, governments, and non-profit organizations.

What happens when we introduce a new technology like DevOps, or Docker containers, or a public cloud? Is the limitation diminished by the fact that we started using “DevOps tools” or practices? Yes, probably. But did we get the full potential benefit from using these tools and practices while leaving our intuitions and policies unchanged?

Q 3: What policies exist before getting into DevOps?

The third question is of utmost importance, and it has no easy answer. Even when current policies are well known, in many cases, there are just so many policies that cause such a multitude of problems it is hard to find and address all of them. And it is imperative to address them, or else we cannot gain the full benefits that come from lifting the limitation.

For example, if the promise of DevOps is that we can respond to every customer’s whim and request about our platform in less than a day — this is a significant competitive advantage. Can we actually do this if deployments are, by policy, done once a month? No, that policy must be abolished.

Dr. Goldratt warns that it is critical to find the root of the complexity first, the main thing that if we would change it, everything else falls into place. The ToC method to map all undesirable effects and expose this core conflict is called Logical Thinking Processes, logically finding all the relations of causes and effects. It is similar to the Five Whys method from Lean, but much more rigorous. There is just not enough space here to explain all of the Logical Thinking Processes, but an excellent book that goes into depth to elaborate on these was written by Lisa J. Scheinkopf, called “Thinking for a Change.”

Allow me to try and describe some of the policies that enable success for companies in the “pre-DevOps.” era, where competition had more or less the same limitations as we did.

  • There must be a separate operations team (with their own manager) being responsible for the applications in a production environment. Because developers don’t want or don’t know how to operate systems in production.
  • The ops team is being put in charge of doing deployments since they can cope with failure faster. Because deployment of changes has a substantial risk, failure rates are often high and restoring a service from failure often takes a long time.
  • The introduction of a new role called SQA/QA/QC (a whole team) will ensure high quality of all introduced changes. Since managers are often not expecting high-quality software code to be written, and in practice, it quite often fails in spectacular ways.
  • Managers enforce deadlines by adding yet another role called a Release Manager/Engineer. The Release Manager will manage the schedule and deliverables deployed to production. Because developers writing code and testing it for quality take too long and need to be rushed, and releases must happen in a more orderly fashion than simply ad-hoc.
  • Deployments once a week (or once a month, a quarter) are mandated as a policy. They are even scheduled on the first day of the week, but the handovers of Dev-Qa-RelEng-Ops constantly move this occurrence to the last day of the week. Often Operators act as heroes and are doing deployments for the duration of the weekend. The reason for this policy would be a reality where rapid change is dictated by high-velocity competitors, each company must deliver features faster.
  • Developing new features take priority of time and budget away from reliability or quality improvement and invests all hands in creating more new features instead of more reliability. The reason most likely being managers intuition, which guides the company in a belief that better products and more paying customers must stem from new features.
Several years ago, I was personally attending an urgent meeting in a company to solve the problem of deployments on the weekend. My suggestion was to try and deploy even more often, like every hour, or perhaps every day. This proposal was met with a strange look and definite “NO!”. The explanation that followed from management explained that deployments were too risky, and the business cannot suffer downtime every hour or every day — and this is why deployments happen once every couple of weekends to this day. The managers were interested in hearing how to improve, any suggestion that priority should be lowered from writing more features and investing more in making deployments reliable was denied, a classic conflict.

Why would managers decide to prioritize new features instead of the reliability of their service? Here we actually get a bit closer to the root cause of why organizations rarely get any benefit from DevOps practices. The way managers are deciding in their day-to-day feels right to their intuition, it has always proven to bring success in the past.

The thing is that this way of decision making is by far better than just making arbitrary decisions. Due to the shortage of information, a manager can rarely decide on any better decision and has to compromise on his local optima. A decision made based on local optima will in general not be as bad as a random one. At the same time it is by far worse than prioritizing based on a system-thinking or holistic measurement of the effect of a decision on the bottom line. New competitors are doing just that, measuring the effect of decisions on their bottom line, and are reaping all the benefits from their holistic approach. A great example comes from Etsy who explain how they “Design for Continuous Experimentation.”

As a consultant, I often confess that “Great high-velocity organizations rarely call up experts for help.” From getting around quite a lot visiting many tech companies for the last five years, I can say definitively that the above policies and rules are still in effect in the majority of organizations I had been in contact with. Talking to other consultants in the field reveals that this is a global truth, even companies that do have a “DevOps Team,” or “DevOps Engineer/s,” or jump on the latest and greatest “DevOps Tools.” In most cases keep their old rules and policies, and receive almost no benefit from their adoption of DevOps. The decision making of managers in these organizations did not change, and none of the advantages promised by the DevOps movement are to be found anywhere. A company can adopt Docker, Kubernetes, Cloud, Serverless, but still, have a total disconnect between developers and customer needs — and have deployments occur as frequently as a solar eclipse.

In examples narrated by Dr. Goldratt in his audio series “Beyond the Goal,” he explains that early adopters of technology are those who understand the benefit and reap the rewards. When the industry starts to take note, everyone wants a piece of the pie — but they rarely know how to act on it. In the world of DevOps, these early adopters would be companies like Netflix, Flickr, Etsy, Amazon. Once these companies are very successful because of their new technologies, other companies want to replicate the things those have done to compete.

Companies trying to become the next Netflix or Etsy replicate the technology, the tools, the buzzwords. And a neglect to change existing policies and rules leaves them bewildered, not understanding why nothing improved. Now that there is a big new “DevOps Team” full of “DevOps Engineers” using the latest and greatest “DevOps Tools,” they are still unable to deliver software any faster than before. The obvious conclusion? “DevOps does not work,” “DevOps does not bring any benefit,” managers see it just as additional labor and politicking that has no real impact on the bottom line not realizing that they are just doing the wrong thing better.

A multitude of policies hinders companies from adapting to the extremely rapidly changing world. Managers think that just taking this DevOps concept, putting it in, will immediately make the company move much faster, “Agile,” which in most cases is just doing the wrong thing faster without understanding why running faster is not the same as advancing forward.

This conflict of the “Wrong DevOps” inflicted by managers on their employees is causing all kinds of problems for both employees and organizations. I personally have witnessed developers, ops and “devops engineers” live in constant frustration. Stories of burnout and worse are abundant, businesses go bankrupt, services crash and burn. Companies that have a blockbuster legacy product with huge margins keep the fire burning alongside employees who are doing nothing or doing a lot to negative effect just investing in new initiatives.

Keeping old policies and going forward with a “DevOps Team,” “DevOps Engineer,” and “DevOps Tools” will rarely change anything regarding business value. Today everyone is a “DevOps Engineer,” and every company has a “DevOps team.” It was even voted as the second best role based on salary and satisfaction by Glassdoor “50 Best Jobs in America” for 2017! Notice that it didn’t even exist as a profession in the 2016 list.

Do these companies enjoy the advantages of stability, quality and moving forward faster than their competitors? Most of these organizations are still stuck with the same policies and rules that they had before the transformation. It takes some smart management to come up with changes to an organization that will enable the benefits. Because even though talented employees might contribute overall improvements by doing things the right way and giving a personal example. Eventually, the power over changing policy and rules, to set new standards of work, is in the hands of management.

We can do better than this, it only takes some thought and courage to change entrenched behaviors and adopt more suitable ones.

Q 4: What new policies must we adopt for a DevOps transformation?

According to Dr. Goldratt, this is the hardest question of all, and also the most important.

Dr. Goldratt and Dr. Deming teach us to look at an organization as a whole, if the goal of the organization is improving the bottom line results — then decisions across the entire organization need to holistically align to that goal. Continuing to march forward without looking at the data, without measuring how the current practices are serving the system leaves managers in their own bubble of local optima rules. The mentioned companies Netflix, Etsy, Amazon, have a lot of measures and collected data to help the organization choose better decisions and practices that actually improve the bottom line. So before we hire our first “DevOps Engineer,” or start a new “DevOps Team,” or go searching for the best “DevOps Tool,” like containerization, how about starting some measurements? Measure the value delivered to customers and provide clear visibility into these measurements to our managers and employees, this can have a tremendous effect.

Do we believe that quality has an effect on the bottom line? Let us then inspect our policy regarding quality. Do we have a good definition of how quality looks like? Is producing quality, measuring quality and improving quality at the top of our company priorities? Is quality built-in in software development, or is there a policy that tries to hammer quality into software changes after these were created? Does our organization require all our developers to write software unit tests? How about our front-end developers? Are these tests used to provide immediate feedback and increase the trust in our service reliability? Do we measure the effect of bad quality in our service or product has on our bottom line?

The new rules and policies cannot be cookie-cut fit all, we must do some soul searching and find the relevant standards for our organization. Yes, this is hard work, it requires to invest some thinking time instead of spending time in putting out fires. It mandates to learn and to read ideas from incredible men like Dr. Goldratt and Dr. Deming, and try to apply their ideas in our vicinity. Dr. Goldratt, for example, has a big chunk of knowledge on how to handle putting out fires, part of this Engines of Disharmony discussion. There is just so much more in the magnificent books written by Dr. Goldratt, his followers, and his teachers.

I will not write all the new regulations and policies that a company must adopt. This will be left for you, the reader. I do invite you to start a discussion on what these new policies might be. There is a lot of lists and ideas about these around the internet, but none are without drawbacks, and none fit every organization like a glove, context is quite important, and experimenting is part of the fun!

Epilogue

Taking powerful ideas and methods from Giants who form the base of the DevOps movement, can teach us a lot of new interesting ways to think and approach management and engineering.

In this post, I wrote the four questions that evaluate technology in the eye of Dr. Goldratt and tried to put my own interpretation on these to apply to DevOps. Even though I started writing this blog with a different tool from Goldratt which I wanted to discuss, that will have to wait for next time.

Books mentioned in this article:

DevOps Transformation using Theory of Constraints
Evgeny Zislis
Co-Founder & CTO
Evgeny is our technology ‘capitan’, who brings simple solutions to complex problems to life for our clients. He has been helping developers become more productive and handling software packaging and deployment as well as production systems operations since 2004. He is passionate about doing things that have an impact on the bottom line, based on TOC methodology and applying DevOps to patterns and practices into any environment. He keeps himself busy organizing and speaking at meetups as well as presenting at various conferences as well as managing the Operations Israel online community on Facebook.