devops feedback loops Archives - SD Times https://sdtimes.com/tag/devops-feedback-loops/ Software Development News Wed, 10 Aug 2022 14:51:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://sdtimes.com/wp-content/uploads/2019/06/bnGl7Am3_400x400-50x50.jpeg devops feedback loops Archives - SD Times https://sdtimes.com/tag/devops-feedback-loops/ 32 32 DevOps Feedback Loop Explained: Weak Feedback https://sdtimes.com/devops/devops-feedback-loop-explained-weak-feedback/ Tue, 09 Aug 2022 15:38:54 +0000 https://sdtimes.com/?p=48525 Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a … continue reading

The post DevOps Feedback Loop Explained: Weak Feedback appeared first on SD Times.

]]>
Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a system or process to effectively gather and enable data-driven decisions; and behavior based on the feedback collected. We’ll also look at some potential issues and explore various countermeasures to address things like delayed feedback, noisy feedback, cascading feedback, and weak feedback. To do this, in this four-part series we’ll follow newly onboarded associate Alice through her experience with this new organization which needs to accelerate organizational value creation and delivery processes.

Our previous stories were devoted to the delayed, noisy and cascaded feedback loops, and today we will shed light on what the weak feedback means.

As you might remember from those previous articles, “Alice” joined a company, working on a digital product to accelerate delivery. The engineering team was relatively small, about 50 engineers, with three cross-functional teams of 6 engineers, shared services for data, infrastructure, and user acceptance testing (UAT). 

Alice knows that code quality and maintainability are important attributes of fast digital delivery. The simple and clean code structure shortens the time to implement a new feature. She knew the ropes thanks to the great books by Robert Martin explaining the concept of clean code. So she asked the engineering teams whether they were addressing findings from Static Code Analysis (SCA) tools that could find code quality issues. Moreover, the engineering teams assured Alice that SCA is an explicit part of the definition of done for every feature.

However, when Alice looked at the SCA report she had a hard time finding a reasonable explanation why there were so many issues. When she observed how engineers followed the definition of done, she found that some of them strictly followed what was prescribed, and some did not. This is what we call weak feedback loops when certain feedback can be skipped or its result ignored. 

The adverse effect of weak feedback are:

  • Accumulation of the quality debt 
  • Slow down delivery because of unplanned work later

To address such a situation, there were a lot of options. We need to shift left the feedback collection and run it as early as possible and make it a mandatory quality gate. In Alice’s case, it was possible to introduce SCA as a part of pull request verification and impossible to approve the merge if issues were not resolved or enforce such feedback after the merge. The successful mitigation strategy is quality gate enforcement; however, its straightforward introduction with the accumulated debt might lead to the pushback from the business side; it takes time to clean up the accumulated debt and wasted churn. We would recommend incremental enforcement of the quality gate as capabilities improve. 

Another aspect to take into account is when we are introducing a quality gate on the pull request level before the code even merges into a product – the infrastructure cost. The more engineers you have the higher frequency of the pull request you will have the more robust and scalable infrastructure to run all required feedback activities you need to have. Fragile infrastructure will lead to a noise problem; and therefore, push back on the team to make sure you get beyond weak feedback. As a part of a strategy to address weak feedback, make sure that your feedback noise is mitigated and the infrastructure is reliable. 

In the conclusion of these four articles, we would like to reiterate the importance to look at the digital product delivery work organization through the prism of feedback loops, especially: 

  • what quality attributes are important
  • how fast you can deliver quality feedback 
  • how accurate reflective it is, and 
  • how you manage impacts and dependencies in case of cascaded feedbacks. 

The post DevOps Feedback Loop Explained: Weak Feedback appeared first on SD Times.

]]>
DevOps feedback loop explained: Cascaded feedback https://sdtimes.com/valuestream/devops-feedback-loop-explained-cascaded-feedback/ Thu, 28 Jul 2022 17:48:05 +0000 https://sdtimes.com/?p=48408 Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a … continue reading

The post DevOps feedback loop explained: Cascaded feedback appeared first on SD Times.

]]>
Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a system or process to effectively gather and enable data-driven decisions; and behavior based on the feedback collected. We’ll also look at some potential issues and explore various countermeasures to address things like delayed feedback, noisy feedback, cascading feedback, and weak feedback. To do this, in this four-part series we’ll follow newly onboarded associate Alice through her experience with this new organization which needs to accelerate organizational value creation and delivery processes.

As Alice looked at the bigger picture of the quality process, it became clear that earlier feedback impacted, and may have created or obscured, subsequent feedback or issues.

A significant challenge of the past has been the ability to realistically represent and measure performance in all but the simplest of processes.  The reality is that most of our processes have dependencies and external influences.  While these were difficult at best using manual tools, automation of processes and the advent of observability enables a more realistic representation.  Exposing obscure relationships through discovery and understanding the relationships enable a better and more robust model for identification and measurement. This is especially important to begin to see and understand relationships, especially those that are complex and not easily observed.

Alice realized that the feedback loops that were providing information to product management were frequently misunderstood or used data that was not appropriate for the use (e.g. not fully burdened costs) as conflicting and not well documented microservice architecture and API implementations that have proliferated in their current environment.  Of course, we’ve long struggled with aggregating multiple KPIs that do not really reflect on or result in the desired outcome.  

As Alice explained to the product manager, the interactions between complex components of a microservices environment and automated business process ecosystems are an increasingly complex environment  of  interactions.  The delivered value or outcome must be engaged, such as the introduction of market leading capabilities faster and better than anyone else.

We can think of interdependent processes as something like the availability impact of multiple dependent systems, using availability as an analog for confidence in the feedback results as well as likely performance expectations.  Additionally, this approach identifies relative capability improvement with current approach / architecture:

(image from Standing On Shoulders: A Leader’s Guide to Digital Transformation ©2019-2020 Standing On Shoulders, LLC, used with permission) Image & Table depicts aggregated availability based on interdependent system availability and resulting net total availability.

In this example, the total system availability is the product of the dependent systems for the same business process scenarios, in this case by looking at component improvements and availability outcomes.  The impact of the performance of otherwise independent systems can have an enormous impact on complex business processes.   We must take care to understand the feedback loops and how we may encourage or even create subsequent noise via cascade. Transparency can be the key.

Earlier, we talked about noise in testing and impacts to trust and confidence.  That is another dimension of this same challenge, and opportunity. 

Alice and the product manager concluded that this might be related to their objectives for reduced fire fighting and improved collaboration.  Improved monitoring and if possible adding instrumentation or telemetry might be effective countermeasures that are consistent with other ongoing work.  The direct visibility of impact and alignment with the outcome is the best feedback of all, particularly when our part may be somewhat obscured or limited by other stream components. Understanding and modeling enable us to experiment and learn, especially with critical value systems.

Looking ahead, improving ecosystem visualization capabilities  in an evolving value stream management environment to capture and evaluate model quality and data consistency seems imminent.  Doing this might be a goal state that should soon be realizable soon with dynamic traceability maturing and observability seemingly in our near future.  

 

The post DevOps feedback loop explained: Cascaded feedback appeared first on SD Times.

]]>
DevOps feedback loop explained: Noisy feedback https://sdtimes.com/valuestream/devops-feedback-loop-explained-noisy-feedback/ Thu, 21 Jul 2022 14:00:13 +0000 https://sdtimes.com/?p=48331 Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a … continue reading

The post DevOps feedback loop explained: Noisy feedback appeared first on SD Times.

]]>
Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a system or process to effectively gather and enable data-driven decisions; and behavior based on the feedback collected. We’ll also look at some potential issues and explore various countermeasures to address things like delayed feedback, noisy feedback, cascading feedback, and weak feedback. To do this, in this four-part series we’ll follow newly onboarded associate Alice through her experience with this new organization which needs to accelerate organizational value creation and delivery processes.

Our previous story was devoted to delayed feedback. Today let’s look at what noisy feedback means for the speed of digital product delivery.

As you may recall from Part One, Alice joined the company to work on a digital product, with the specific goal to accelerate delivery. The engineering team was relatively small, about 50 engineers, with three cross-functional teams of 6 engineers, shared services for data, infrastructure, and user acceptance testing (UAT). Analysis showed that the largest amount of time spent in the product delivery process  was spent in testing after code development was completed.

Alice learned that the team has an automated regression suite that runs every night (4 hours) and always has about a 25% failure rate for 1,000 tests. Some engineers even tried to fix these issues, but they didn’t have time because of the release deadline and feature development priority, so no one had done anything substantial about it. To keep the ball rolling and continue feature development, it was customary to skip results and move forward. It was easy to close your eyes to the small noise/failed tests especially if you know that the test failure is not a product defect but a test defect. Indeed, it would be great if automated regression had found defects as it was supposed to do. Instead, failed tests signaled environmental issues in which tests are executed. The typical issues were network latency leading to the timeout services, wrong version of the components the product is integrating with, network access issues, wrong libraries on the server to run the application, the database was corrupted data, etc. 

To investigate and discern the root cause of the failed tests’ actual defect from environment misconfiguration or malfunction, the engineering team needed to dedicate a significant amount of time given the accumulated volume. And as you might suspect, most of the environmental issues were under the control of the infrastructure team and the data team. These teams were focused on the production environment being focused on firefighting, keeping a small capacity to support product delivery. As you can imagine, it was hard to find a common language for these three groups since all of them were independently responsible for their piece of value delivery but didn’t recognize the importance of working together on every value increment. 

Such a situation had several adverse consequences: 

  • Trust in automated tests deteriorated: the engineering team didn’t look at automated tests results 
  • Quality degradation since there were actual defects to be addressed, but they were hidden under the noise.
  • The shared team focused on firefighting, most likely because no one addressed environment consistency early in the process
  • Collaboration issues among teams due to capacity constraints

Alice proposes to fix such an issue with fragile and inaccurate quality feedback from nightly regression. She suggested gradually reducing the number of failed tests and blocking further development unless the threshold is achieved. Given the initial start of 25% (250 failed tests) it might be reasonable to set the target of 20% and then, with a 3% increment, go down to 2-3% of allowed failed tests. Therefore, for a specific period, the product team would allocate some % of capacity to address this “quality debt” and refactor tests, fix infrastructure, or address data issues affecting test results. She also proposed for the transition period to dedicate one DevOps and one data person per team for at least a sprint to ensure the teams can challenge the status quo with appropriate domain expertise. As an outcome, she expected to reduce the number of production incidents that distracted all groups.

To justify such a change from a financial point of view, first of all, she needed to calculate how much the production deployment and post-deployment incidents cost to address, and also calculate the average cost of a defect in production. (It might be the revenue loss and/or labor costs to fix the issue). Since her proposal is temporary and the release production issues are continuous, it was easy to quickly confirm, and gain quick benefit. 

Let us take a look at the numbers: 

  • Revenue loss because of defects varied from $100 per minute to $1,000 per minute because of reputational consequences. Last year’s loss was estimated as half the cost of one full-time engineer (FTE).
  • Post-production release stabilization costs typically average one engineering team being focused over a couple of days to fix as well as the infrastructure and database team. The last reporting period had three days,  with six engineers from the product team and two engineers each from infrastructure and database. Total ten engineers for three days. Over the past few releases this has been about 120 full-time engineering days

And required investment 

  • Three teams allocated 10% of their capacity to address these issues, which is about two engineers per release. Given initial coverage of 25% they might need 5-6 releases to stabilize the regression suite. So it is about 12 full-time engineering days.  

As you can see, the cost implications of leaked defects because of the fragile environment were substantially more than the required investment of 120 full-time engineers vs 12 days. Therefore, after discussion with the product manager, she got approval to start fixing the noisy feedback and improve its accuracy and value for the engineering team. 

Alice’s story didn’t end here, she also investigated several other issues known as cascaded feedback and weak feedback. We will unfold these terms in the following stories.

To summarize this story, we would emphasize the importance of a feedback loop frame when you optimize digital product delivery. In addition to the short time to get feedback, feedback accuracy also plays a vital role in ensuring the speed of delivery. 

The post DevOps feedback loop explained: Noisy feedback appeared first on SD Times.

]]>
DevOps feedback loop explained: Delayed feedback https://sdtimes.com/devops/devops-feedback-loop-explained-delayed-feedback/ Fri, 15 Jul 2022 14:03:23 +0000 https://sdtimes.com/?p=48279 Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a … continue reading

The post DevOps feedback loop explained: Delayed feedback appeared first on SD Times.

]]>
Feedback is routinely requested and occasionally considered. Using feedback and doing something with it is nowhere near as routine, unfortunately. Perhaps this has been due to a lack of a practical application based on a focused understanding of feedback loops, and how to leverage them. We’ll look at Feedback Loops, the purposeful design of a system or process to effectively gather and enable data-driven decisions; and behavior based on the feedback collected. We’ll also look at some potential issues and explore various countermeasures to address things like delayed feedback, noisy feedback, cascading feedback, and weak feedback. To do this, in this four-part series we’ll follow newly onboarded associate Alice through her experience with this new organization which needs to accelerate organizational value creation and delivery processes.

Alice joined this company recently, getting a nice bump in pay and the promise of working on a cutting-edge digital product. Management recruited her aggressively to address an organization crisis: unacceptable speed of delivery. The challenge was to accelerate delivery from once a month to every two weeks. The engineering team was relatively small (about 50 engineers) scattered across different functional areas.

On day one, Alice learned that the product teams consisted of three cross-functional engineering teams, each with six engineers. She was excited to learn that test engineers and software engineers routinely work together. However, it seemed strange that the organization had separated shared services for data, infrastructure, and user acceptance testing (UAT), even though data and infrastructure were parts of the product. Learning that the current release cycle had “at least” one week for UAT, and the product team reserves some time to bug-fix on the following release cycle based on feedback from UAT was a bit of a surprise and of immediate interest. 

Alice knew that the software development process could be described as a set of feedback loops between code development activities performed by engineers and various feedback activities. These feedback activities verify the quality of implemented features from functional as well as non-functional standpoints. 

These activities are well known with multiple approaches, and are generally designed and executed by team engineers: unit testing, code reviews, security testing, and sometimes by specialized engineers, such as for performance testing, security testing, chaos engineering, and the like. These feedback activities are associated with different artifacts or manifestations — code change, a feature branch, a build, an or an integrated solution, as examples. 

Feedback activities might (should) affect the whole delivery process for both effectiveness and efficiency.

Figure 1. Simplified software development process with feedback

Delayed Feedback

Delayed feedback has several adverse implications for the speed of capability delivery: 

  • While waiting for feedback regarding a product’s qualitative attributes, engineers often continue writing code on top of the codebase they assume to be correct; therefore, the more delayed the feedback the greater potential for rework, that is more likely to be impactful. 
  • Often such rework is not planned; therefore, it will likely delay direct and collateral product delivery plans as well as negatively impact resource utilization. 

It was evident to Alice that the UAT team provided feedback to the product team very late, so it could be a great starting point to accelerate delivery by eliminating delay or shortening its release cycle. Alice started her analysis journey by calculating the UAT delayed feedback impact on the delivery. 

It is easy to calculate; we just need to know the probability of feedback from a step. To do so we need to know the ratio of all features with defects to all features delivered to the UAT step of the process. It gives her a probability that a feature requires rework after UAT, in this case, it was 30%; therefore percent of complete and accurate features in this case is 70%. 

Here is a link to the diagram shown below, created at VSOptima platform to explore “the Alice challenge”. If you like, you can run a simulation and see the implication of the rework ratio and delayed feedback loop to the overall delivery throughput, activity ratio and lead time. What is important is that you can observe that such a feedback loop consumes the capacity of the code development activity and generates flow’s loops.

Next, we can calculate how much the rework costs for the delivery team. There are two components of this cost. The first is direct cost to address an issue. Alice learned that on average one defect costs about one day of work for software engineers and test engineers since they needed to reproduce an issue, determine how to remediate, write and execute tests, merge the fix back into a product, and verify the fix didn’t affect other features. 

The second is the cost of product roadmap delay. If we delay the release for one day what would be a revenue loss? Often it is hard to estimate that, and Alice didn’t get any tangible number from the product managers. 

But even just the direct cost associated with the feedback fix gave her excellent ammunition to defend her plan to shorten the release cycle. Out of 20 features delivered on average in each release cycle, on average six required rework. Remediation typically takes six days for 12 engineers, which is about 20 percent of the release planned capacity. 

We have three teams of six engineers each, a total of 18 engineers.  

The release cycle is one month; therefore, 20 working days. 

If we multiply 18 engineers by 20 working days we will get the full capacity of 360 engineering working days

Since we have six features for rework, we need six days of 12 engineers to do rework, which is 72 engineering working days    

So 72 out of 360 is 20% of engineering working days spent for rework. 

Alice set the first goal to accelerate delivery up to 20%. She knew they could do that if she found a way to reduce the time required to produce feedback and made it immediately available for engineers while they were still in the code.  

Alice asked the UAT team to specify acceptance scenarios for all features as part of the work definition or story so that a cross-functional feature team can implement automated testing for these scenarios along with their code development. Therefore, a feedback loop can be almost instantaneous. If the acceptance test doesn’t pass, then the engineer can immediately address a defect in a much faster way. 

Alice also investigated several other issues known as noisy feedback, cascaded feedback, and weak feedback. We will unfold these terms in the following stories.

In summarizing this story, we would like to emphasize the importance of the frame of a feedback loop when you do optimization of the digital product delivery, and understanding that it is not linear — the longer it takes to get feedback the more difficult it is to address a defect  because the code base and complexity grows with time. 

To accelerate digital product delivery, leaders should strive to eliminate or mitigate backward cycles generating unplanned work and affecting planned capacity; instead, we should design processes where work is done the first time correctly. 

 

The post DevOps feedback loop explained: Delayed feedback appeared first on SD Times.

]]>