Sunday, 06 December 2015

Adventures in Feature Toggling

52 books in a row

Continuous Delivery has proven to be an invaluable technique in rapidly developing software, but integrating code continuousy at sub-feature resolution and deploying this to live (ideally with every push) presents some challenges. For all that frequent integrations with the mainline bring benefits to development, often the feature being developed is not yet customer-ready. Feature toggles are one, increasingly popular, response.

A feature toggle allows some functionality of a product to be enabled and disabled by configuration. It is used while developing the feature and releasing to production to enable testing before fully launching the feature, then (usually) removed at the earliest opportunity once the feature has settled.

Not all new features require this - sometimes a new feature is small enough that it can be integrated very quickly with minimum risk for example. Some developers prefer to take a "UI last" approach as an alternative to feature toggles, but this encourages inside-out development whereas I find more benefit in outside-in, driving development with a failing customer test so that the lower levels can be made to directly serve the requirement without second-guessing what the UI layer will require.

There is no one, common pattern to feature toggling, though a few distinct patterns have emerged along with some third party libraries and services. This article reflects my own journey with feature toggles, and is not intended to prescribe any one approach - context is king, and how feature toggles are implemented should complement a team's mission and make-up and fit in with the technology in use.

In the beginning...

The first feature toggle system I worked with in production was based on a configuration file in the (monolithic) codebase which set the default states of the toggles. These could be switched at runtime using a user interface on the webapp itself (visible only on the company network), the states of the toggles synchronised between instances of the app using Zookeeper.

When picking up a new story to work on we worked out whether we needed a feature toggle, and if so then adding this to the config was among the first tasks. For running locally, in the pre-production environments or running the functional tests (aka acceptance/customer tests) the toggles were all switched on. Every so often we would play a story to remove all no-longer needed feature toggles.

Interestingly, there were two feature toggles that were never removed as they were useful features to toggle off for local development. These weren't directly user-facing features, but were related to analytics gathering and the use of a CDN to load CSS and JS.

This was my introduction to the whole concept, and as I moved along the learning curve I made observations and had reservations, along with occasionally hitting situations where this approach seemed sub-optimal. This led to experimenting more with the whole concept, both in the 10% time my employer provided and also in the day-to-day work.

Homebrew it

There are libraries and services out there now intended to help with feature toggling, however the whole concept is so simple I question the wisdom of adding a third-party dependency and the costs that that brings; additionally building it in-house allows it to be tailored to fit the application, team and process.

Possibly once an in-house solution is in place it could be extracted to a library (or service) for use in-house, but beware the temptation of deduplication.

(Non-)Functional Requirements

An irony of this way feature toggles were implemented was that the application was effectively stateless, built using Scala with some attention to functional programming principles - and yet here in the middle of it all was shared, global, mutable state. And it came with the classic downsides of that.

At the micro-level, unit testing is harder if the function you are testing may be needing to reach out to some shared global state, which you will need to mutate before invoking the function. In a large codebase with thousands of tests you are also potentially denying yourself the opportunity to parallelise those tests. Only testing with the toggle on is halving the branch coverage, denying the existence of potential paths through the codebase and providing ideal locations for bugs to proliferate.

Sure, your feature toggles are "strictly short-term", and I'm certain you've never said that about other pieces of production code that turned out to be somewhat longer term, have you? Of course, accounting for toggles in tests means slightly more work when removing the toggles later, so the risk has to be balanced, but let's be open about this.

At the macro-level there is much the same testing problem - you can toggle only globally, which may be okay in a pre-production environment with limited use and good coordination between testers, but when it comes to flipping the switch in live you can't be entirely sure that something will be broken by environmental factors and you will be turning it on for everyone at once.

The solution to all of this to me seemed to be to treat your toggle code as production code, in more ways than one. First of all, acknowledge that status in automated testing, checking code branches created by the toggles. Secondly, apply the principles you believe in for production code consistently. For example if you don't like shared, global, mutable state generally then don't make an exception for feature toggles.

New features on request

In general, not only FP, shared, global, mutable state is best avoided (and not just because it is a moutful of a phrase) and referential transparency is to be prefered. The output should depend on the input and not be mixed up with some other hidden input which you may be able to influence but is also under the influence of others.

Given that the input to a webapp is an HTTP request from a browser, and the output the response, we moved to using feature toggles based on cookies. This enabled isolated testing of features in production, removing some of the risk and enabling continuous deployment. Cookies worked in this case better than a request header (which was the initial idea) as they, with some caveats, get sent with every request, so if a feature toggle also affects the response to an AJAX request then everything will be in-sync.

Implementing this at the macro-level meant also moving to more referential transparency at the micro-level as the change in state would have to be passed through from the request handling level to anywhere that behaviour needed modifying. Typically this is only at higher levels anyway, but as passing such context around everywhere does not feel good it can also encourage alternative approaches to how behaviour in an application can be modified.

Burn All Flags

The cliched representation of computer programming in popular culture is binary, ones and zeroes, on-and-off, yet in software development we've generally come to realise that passing booleans to functions to modify their behaviour is a bit of an anti-pattern. What are feature toggles AKA feature flags if not a boolean passed to a function in order to modify its behaviour? We've talked about treating your feature toggle code as production code - shouldn't this be an, ahem, red flag?

My next experiment was then to stop thinking in terms of feature toggles and start thinking in terms of a feature set. If you are working with types in a language with a half-decent collection library (hopefully this is the majority of you) you can translate this idea very literally into code:

val features: Set[Feature] = activeFeatures(request)

Now the toggling is based on the presence or absence of a feature in the feature set (literally a Set of Features) for the current request. This doesn't immediately solve the flag problem - testing for the presence of a feature is logically little different to being told whether the feature is enabled - but it does open some interesting avenues.

For one, you will naturally start talking in terms of a feature set rather than feature toggles. The current and future feature sets are a business concept, feature toggles are a technical implementation concept. Aligning concepts like this helps communication.

Having a set of all features provides something you can iterate over. For example, if you have a UI to show you the state of your feature set you can make building it much more generic by iterating the collection rather than manually changing it every time to check boolean flags (error-prone and laborious). Similarly it helps when debugging if you want to log out the currently active feature set.

By this point the approach taken in the webapp was to have a list of functions which were applied sequentially to an empty view model to build it up for rendering by a template engine. Having a set of features for the current request maps quite neatly to this - associate the functions with the features and you can modify the behaviour without passing either flags or feature sets down any further, minimising parameters and helping to preserve single responsibilities. It is eminently testable.

Doing this requires a certain discipline and architecture, but those should not be dirty words.

Long-lived and Prosperous

When a feature is "done" (for some definition of done, which you probably have filed away somewhere) we are firmly advised that the toggle should be removed at the earliest convenience. But if we're talking now in terms of feature sets rather than toggles, do we say we are removing the feature? That feels... wrong. The feature is still there, after all. Is this seemingly subtle language shift telling us something?

The original application already had some long-lived toggles useful for pre-production environments. The debate raged, should they be feature toggles at all? Are they not just configuration? What is the difference? It's almost a philosophical question. The behaviour of the app can be modified by Feature Toggles but also by Configuration, both achieving the same end result, the possibility of disabling a feature (for some definition of feature). Where do you draw the line? Does there need to be a line?

If an application is the implementation of a feature set, isn't it useful to express in the current state of the codebase exactly what the feature set is? If your Feature instances are tied to the implementation of those features and all features are accounted for, then isn't that an excellent example of self-documenting code?

As we are no longer talking about toggles we no longer have branches to prune, so the code itself should be naturally fairly tidy. Much of the impetus for removing feature toggles is removed - and there may be some interesting advantages in keeping them around long term.

Graceful Degradation

Recently we discovered that one of the features of the webapp was suddenly performing poorly when under load, affecting performance more generally. Our initial workaround was to make it togglable so that we could turn it off easily any time it seemed to be a problem while we investigated the root cause. As by definition a feature toggle should not result in a broken-looking website when toggled off it's a natural fit for this kind of graceful degradation.

In this case it was a retrospective toggle, which meant work had to be done in defining how the page should look without that feature. If an app is built with long-term feature toggles in mind, then care will have to be taken that the result is not singificantly compromised when a number of features are toggled off. This requires a certain discipline of development approach, but one that dovetails in neatly with lean software development and progressive enhancement.

The features of a product may not, and probably don't, exist in complete isolation - there will be features which don't make sense in the absence of other features. If the features are first class objects in the codebase this relationship can be modeled:

object PDFPreview extends Feature {
  overrid val dependsOn = Set(PDFDownload)
}

In this case the PDF preview feature can be automatically toggled off if PDF download is toggled off - there is no point offering a preview when download is impossible. This may sound like an artifical case - why not make them part of the same more general feature, PDFVersion? But if PDF preview places more load on your servers you may wish to selectively disable it without preventing people from downloading.

In a service-oriented system PDF preview may be implemented by another service that your UI webapp calls out to. There's also a strong chance if you have such a system that you have implemented or at least heard of circuit breakers. Although it involves a step back from statelessness, these two concepts can be connected - the open circuit could result in a feature being toggled off, providing automatic graceful degradation. An alternative approach is to have the state of an app's dependencies checked externally and added to all incoming requests as headers, preserving statelessness within the app and again connecting neatly with altering app behaviour at the request level.

Features are more fine-grained than products, and there will probably be a point where if some features aren't toggled on then the app should drastically change behaviour, for example displaying a maintenance page. This is surely the definition of a Minimum Viable Product, easily expressable in code as a feature set:

val mvp: Set[Feature] = Set(Header, Description, PDFDownload)

When the possible feature set is constructed for a request, before proceeding it can be tested that mvp is a subset and if not then an appropriate alternative response produced.

This might also prove useful when building a new product with these techniques - if you already know what your MVP consists of, then define it upfront along with the features themselves, using a marker trait or field to indicate that a feature is not yet implemented and so can be filtered out ofthe feature set created for the request. Then as features get implemented at some point MVP will be reached and the product will launch itself. In this case you will need some way of overriding the fallback so that pre-MVP features can still be tested.

A vs B

As part of product development there's a chance someone will want to do some A/B testing. I've heard of this being connected to feature toggles before, and it can fit in well with modeled features.

In this case there are two alternatives - testing with the presence or absence of a feature or testing two alternative implementations of the same feature. With toggles the latter would probably have to be two separate toggles, one for each implementation, with care taken to ensure only one is toggled on at a time. With modeled features they could be expressed in a type hierarchy:

sealed abstract class PDFDownload(val buttonColour: String) extends Feature
object PDFDownload {
    object BlueButton extends PDFDownload("0000FF")
    object RedButton extends PDFDownload("FF0000")
}

It should be possible to ensure in feature set construction that there are never two variants of the same feature present, a trickier task requiring custom code each time when using boolean flags.

Bugs

What is a Bug but a Feature with Unintended?

...

No, let's not go there.

Conclusion

Continuously delivering to contribute to feature sets is what we do, and feature toggles have proven themselves a valuable technique in this. Some well understood common practices surround them.

The purpose of this article is not to launch some specific Big Alternative Idea on the world, but to illustrate one team's journey when they've pushed past the phase of being comfortable with the prevailing idea, tested some extensions and alternatives and ultimately innovated.

You may disagree with the ideas presented here - as stated at the beginning, context is king, these were the ideas that emerged from and worked for one particular stable product team in a specific application which already had a long history. Most of the concepts presented here have been put into production in an important, high traffic, public-facing webapp, only a couple of parts are pure speculation. I hope that they are at least thought-provoking, and that the message that pushing the bounds on relatively new techniques can lead to fertile ground for innovation is clear.