When I started working at my current company, nearly 14 years ago now, we had a very waterfall aligned way of working. Back then I was working as a QA engineer, but my work was closely tied to the software release schedule of the software team.

As we produce MedTech devices, they have various requirements on documentation and validation before a release is approved, this affects hardware more severely than software - but we handled development of both essentially as one.

This meant that many of our (Windows Desktop) software products saw updates released every 12 to 18 months, unless there was a critical bug that needed to be fixed - which prompted a hotfix.

Fast-forwarding a few years, in 2019 we got the opportunity to do a fresh start for a project to replace an existing product. Of course this project was named Phoenix (as required by law). This coincided with a new Engineering Manager who had big plans for our software process, which meant assembling a cross-functional agile team that worked in shorter sprints and produced regular releases.

New Times

The team decided on two week long sprints, moved from on-premise systems running TeamCity with self-hosted agents and a big common library of MSBuild tasks to Azure DevOps and Cake. This allowed us several improvmements to our workflow - one of which being we could ensure our builds ran the same when developing locally, as well as when being built by our CI system.

But the greatest gain was the ambition (and support, at least from our nearest manager) to push a release with every sprint. This meant going from 0.75 - 1.0 releases per year to a little over 20 releases per year.

This shortened our feedback loops with our users (and the people who set up and sold systems to our users), giving good focus to the R in R&D. It also meant that we could react much quicker to fix bugs, even when they were not critical.

We released our MVP version to Beta users after a (for us) short 11 sprints, and then version 1.0 to general availability after 17 sprints - and very quickly our users got used to the quick release cadence. We frequently got feedback on new features being tested, and issues that were found were quickly fixed.

The Biggest Changes

The big things we made in order to enable fast and frequent releases were:

Local and CI Builds

As previously mentioned, we leveraged Cake to orchestrate our builds: compiling binaries and installers, running unit tests and code coverage, static code analysis, collecting content and packaging for release, etc. We set this up in a way that when you run the build locally it will do the same operations as when run on CI. This allowed us to be more confident that when we push something to the server, it’s going to build in the same way.

Of course there are some differences we cannot escape, like the CI builds will be signed with a live code signing certificate, while the local builds are signed with a local dummy certificate, etc. We also block any deployments to cloud resources except for when the build runs in CI. But the main part of the build process will behave exactly the same in both environments.

Version Control Discipline

We agreed that any merge into the main branch needed to be “releasable”, meaning first the obvious things like it had to compile, the unit tests needed to pass, etc. But it also meant that there shouldn’t be any half-finished features or known (critical) bugs in the merge.

This was partly handled by having automated checks on all Pull Requests, but also by the team enforcing this belief in discussions and code reviews.

Having a main branch that is always releasable means that if something bad is deployed, it’s easy to go back (or forward) to a good state again.

Feature Toggles

Something we’d never used within our department before, was the concept of Feature Toggles/Switches. Using these allowed us to work on large features that would not necessarily be ready for release, and still merge the code to our main branch without actually affecting users. Our first implementation had a fairly complex setup of sending out feature configs dynamically to clients (along with bundling configs with updates), but after some time we realized that we spent a lot of time maintaining this system and rarely used it.

In the end, we removed the ability to change enabled features remotely, and are just defining the features at compile time (but typically different features are enabled for different installation “tracks”).

Synchronizing With the Others

Probably the biggest pain point was to synchronize this effort with other parts of the organization, which was used to / built around very infrequent releases.

We had to find new contracts with our localization partners, as we now were requesting translations several times a month, rather than 1-3 times per year.

We had to produce regression test reports every sprint, and we had to distil our (manual) regression tests to be more efficient, as they needed to be executed by someone on the team every two weeks.

We tried finding ways around other things to avoid causing overloads on other departments, like Tech Writing who did not appreciate having to update our user manual every two weeks (in the end, they didn’t, and today we update the manuals at most every quarter if something significant has changed or been added).

Post-Mortem, 5 years later

After over 80 releases to Production, the product which was Phoenix has entered into a temporary (?) state of “maintenance” - meaning we don’t work actively on it right now, and only do releases for reported bug fixes. My team was instead assigned to another product in the same “suite” of products, and ended up rewriting it almost entirely from scratch.

We’ve followed essentially the same strategy as we did 5 years earlier, with some tweaks picked up along the way. Over the last year there have been several initiatives across several departments to coordinate better around releases. It has meant that we’ve more frequently needed to use feature toggles to hold back features from General Availability, which is working OK for us - but sometimes it feels odd holding back complete features for several months.

As of late we’ve had parts of the company pushing for essentially a more waterfall-like approach of long-term planning and doing only a few releases per year. We’re trying to reach some compromise which will let us keep up our frequent release schedule, as we believe it’s the best way to deliver continuous value to our users.