Experimentation

Complexity

Consumer markets are complex. Preferences and expectations can change rapidly. There are an infinite number of potential influences on preferences and expectations; our competitors, the economy, the weather, politics, social change, and the products we offer are but a few of the influencers.

A complex system or market, such as a rainforest, the stock market, or a consumer market, is non-deterministic. No matter how deep the analysis, one cannot predict what will happen when they engage with the system. Cause and effect are unknowable. Even in hindsight, determining the complete chain of events that led to our particular outcome is impossible.

Given enough engagement with a complex system, one can identify patterns, formulate theories, and even come up with algorithms that reasonably predict macro behaviors for a short period of time. “A short period” is relative to the rate of change in the complex system.

The prevailing flora and fauna in a natural preserve tend to change over many decades. The same was once true for macroeconomic patterns, but the rate of change appears to be accelerating as the world economy grows more interconnected. Fashion trends, which used to change over a couple of decades, now change over a few years. And in emerging markets, such as electric vehicles, consumer opinions and trends change over months.

Developing software products for complex markets is inherently complex. Desired outcomes are often known, but precisely what we are developing and how the market will respond to it are not knowable until the market actually responds. If opinions and trends change over the course of months and we deliver product updates every few months, we are either delivering to yesterday’s expectations, or making all of our bets on future market conditions.

Part of the challenge is our obsession with certainty.

Certainty

We crave certainty. We admire it. We reward it. Certainty raises confidence. Confidence provides comfort. We look for leaders who are certain - who have experience and can predict precisely what will happen. We expect them to deliver on promises that were made based on those precise predictions. If they change their course, they must have been wrong. And we expect them to not be wrong.

Yet, in a complex system, we cannot be certain of cause and effect. We cannot know beforehand. And we can only guess afterward. In a complex system, certainty of future outcome is hubris.

Our certainty cannot be that we surely must know.

Our certainty must be that we surely cannot know.

In a complex system, our certainty cannot be in our ability to know what will happen, but in our ability to learn and adapt. To accelerate our learning, we need to engage with the system in small ways. We need to probe the system to see how it responds and then adjust.

In a complex system, we must experiment.

Experimentation

Experimentation does not have to adhere to strict scientific standards, but the core elements of a scientific experiment should be considered and included. Given we cannot control all variables, we are not looking to create a predictive and repeatable experiment. We are looking to create a sound and informative experiment. We want to learn.

Scientific experiments have observations, questions, hypotheses, methods, and results. Our experiments should be no different.

Observations

Observations are things about the system we can observe prior to the experiment. Observations inform our questions and hypotheses.

In a totally made-up example - we notice that many user profiles have not set up a home location. This functionality is currently a part of user profile management. Nearly 80% of users have incomplete profiles and 50% have never updated their user profile, relying only on the data they provided in the initial app setup; name, email, password, and VIN.

Question

The question is the aspect being tested. In our fictitious example, our question might be, “How can we encourage users to provide us with more profile data?”

Hypothesis

Our hypotheses are educated guesses in an attempt to answer the question.

Some examples might be:

  • Adding more questions to the initial user setup

  • Adding profile setup reminders to the message center

  • Asking people for relevant profile data when they enter a related feature

    • Dealer search

    • Maps

  • Sending an email with profile prompts

After some discussion, the team concludes that adding profile setup reminders to the message center is the best option to start with. Reasons include that it is unobtrusive, it happens in the app, it does not impede users from accomplishing the task at hand, and it increases user awareness of their profile which in turn should lead to more complete profiles.

Observations and Measures

Having selected a hypothesis, we want to return to our observations and measures. Of course, we will want to monitor user profile updates and profile completion percentages. What else might we want to pay attention to? Is there anything about interactions with the message center? Is there anything about frequency or duration of app use? What else might be impacted by this change and should we monitor it as a part of the experiment?

Method

The method is the exact tools and steps that will be used to run the experiment.

For software, as it is not possible to control all of the variables, it is a good idea to have a control cohort for experiments. If you roll a feature out to the entire customer base and see how it compares against prior observations, you cannot be certain that the behavior under the new feature is truly causal or merely correlated.

Say, for example, you roll out a new search feature and search click-through goes down slightly. You might conclude that the new search is performing more poorly than the prior and roll back. With a control cohort, you can validate whether it is the new search or not. You might discover that search click-through for the control cohort went down drastically. In this case, click-through decreased overall and the site is actually performing better under the new version. Without the control, you couldn’t have realized the change was an improvement until after you rolled it back.

Results

The results are the observations and conclusions from having run the experiment. These help you make decisions about what to do next, independent of hopes, passions, and politics.

Results are neither good nor bad; they are either progressive, neutral, or regressive. All results offer a learning opportunity.

A progressive result substantiates the hypothesis and moves us toward opportunity or resolve.

A neutral result is inconclusive or invalidates the hypothesis. It is neither progressive nor regressive.

A regressive result invalidates the hypothesis and moves us in a detrimental direction.

Beware the one true metric

I worked with a company a few years back that generally did a fantastic job of formulating hypotheses and running experiments. They had a very sophisticated system that allowed them to run multivariate experiments and ascertain statistical significance.

The flaw in their system was their obsession with revenue. Revenue was a measure of every single experiment they ran. And as it turned out, revenue was their key measure. If an experiment did not result in a revenue increase, it was rolled back.

For the most part, this worked out very well.

  • New search provides better results leading to more sales - increased revenue.

  • New home page shows bigger pictures leading to more sales - increased revenue.

But anything that was a long play couldn’t pass the immediate revenue mark. There was a mountain of evidence from similar markets that collecting data on user preferences would allow for improved personalization and lead to more sales in the long run. But any attempt to gather personal preferences from the users did not immediately increase revenue. All of the personal preference experiments were rolled back - even those for which sales remained on par with control.

Over the course of a couple of years, this market pioneer lost ground to competitors. One of the top reasons sighted was that the competitors “offered better products”. In reality, the product offering from the competitors was merely a subset of what the market leader offered. They didn’t have better products, they were better able to show users the products they wanted to see when they wanted to see them.

Related Materials

Doc’s Blog Posts - https://docondev.com/blog/tag/Experimentation+Mindset

The Experimentation Mindset - https://youtu.be/Jo5wD6D6MYY - Video of Doc’s presentation on this topic

Contextual Leadership - https://onbelay.co/articles/2017/11/18/contextual-leadership - Brief overview of complexity sense-making and probe-sense-respond

Cynefin - https://thecynefin.co/about-us/about-cynefin-framework/ - A sense-making framework

Culture of Experimentation - https://hbr.org/2020/03/building-a-culture-of-experimentation - HBR article