The Superpowers and Shadows of A/B Testing: Balancing Data-Driven Success with Bold Innovation

The Superpowers and Shadows of A/B Testing: Balancing Data-Driven Success with Bold Innovation
The Superpowers and Shadows of A/B Testing: Balancing Data-Driven Success with Bold Innovation

“There’s power in the making. You can control the conversation and bridge the communication gap by making.”


A B testing feels like a cheat code when it works. You get clarity. You get confidence. You get a neat little chart that says the decision is “obvious.” Ryan Leffel, Head of Design at Priceline, came to Berlin to talk about why that same superpower can quietly trap teams in over-optimisation, fear of risk, and a slow drift away from bold product thinking.

Ryan borrowed the line from Mark Randolph, Netflix co-founder and first CEO, because it captures what testing does best. It humbles you. It replaces certainty with learning. It forces you to admit that instinct is only a starting point, not a strategy.

Testing Is Not Just for Bookings

Ryan opened with a simple provocation. A B testing can power innovation, but it can also hold you back. The difference is not the tool; it is how you use it, what you measure, and what your culture rewards.

At Priceline, testing is not limited to “increase bookings.” One example was an email variation that nudged users toward Penny, Priceline’s AI travel assistant. The goal was not a conversion lift; it was support deflection. If more people self-serve, the company reduces call volume and saves money.

Another example was a product refresh that could easily have been pushed live as a single redesign. Instead, Priceline ran 22 tests across UI details like silhouettes, button corner radius, and elevation. The intent was not to chase a win. It was to make sure nothing broke. Even a small loser was acceptable if the release stayed safe.

A third example was a bolder shift: collapsing the classic travel search form into a simple search bar for some mobile users. The payoff was space. If you lift the form, you lift the content higher on the page. But rather than bet the experience on a gut feel, they tested before rolling it out fully.

Taken together, these examples made one point clearly. Testing is a way to de-risk change, not only to chase short-term growth.

The Four Superpowers of A B Testing

Ryan framed testing through four strengths. Experimentation gives teams a low-risk way to try new ideas and learn quickly. Precision lets you isolate variables so you can understand what actually moved the metric. Evidence replaces guessing with data, especially when opinions collide. Scalability means a successful change in one area can be rolled out elsewhere. That is the promise. But Ryan’s talk was really about what happens when you treat those strengths as a guarantee rather than a capability.

Every Superpower Casts a Shadow

For each strength, Ryan named a corresponding trap. Experimentation can turn into reactivity, where teams tweak and endlessly pivot, and lose direction. Precision can create tunnel vision, where you optimise a small component while missing the bigger experience. Evidence can lead to analysis paralysis, where the organisation waits for perfect certainty and stops moving.

Scalability can become complexity, where teams roll out changes widely without coordination and drift out of alignment.

Ryan anchored this in the Icarus paradox. The very traits that lead to success can also lead to failure. In the myth, Icarus was given wings, warned not to fly too close to the sun, got overconfident, and crashed. For product teams, the metaphor is straightforward. A culture that becomes addicted to the feeling of “certainty” can end up flying too close to the sun.

Biases Are Quiet and Expensive

A B testing is often sold as objective, but Ryan reminded the room that the humans running tests are not. Bias shapes what we choose to test, how we interpret results, and whether we notice warning signs early enough.

He used Groupon as the cautionary tale. Groupon discovered that email was a powerful driver, so they sent an email a day and saw good engagement. Confidence grew. They increased the frequency to multiple emails a day, eventually up to five. The early metrics made it feel like the right move. The long-term signals did not. Users burned out. Open rates dropped. Unsubscribes rose. Fatigue crept in behind the headline wins.

Three biases show up in that story.

1- Confirmation bias makes you favour data that supports what you already believe.

2-Survivorship bias makes you remember the tests that worked and forget the ones that did not.

3-Sunk cost fallacy makes it harder to stop when you have already invested and rallied the organisation.

Ryan’s countermove was practical. Look for what could prove you wrong. Pay attention to the full set of signals, not only the early positive ones. Set checkpoints so you can pivot before the damage becomes permanent.

Culture and Curiosity Beat Numbers Alone

Ryan’s most memorable section had nothing to do with charts. It was a story from the film Big, where a 12-year-old boy stuck in an adult body ends up in a boardroom at a toy company.

In the scene, executives pitch a toy idea backed by data and industry reports. Josh, the child in the room, plays with it and says, “I don’t get it.” The room expects him to be shut down. Instead, the CEO leans in and asks, “What don’t you get?” Josh explains that the toy is not fun. It is a building that turns into a robot, and he suggests a robot that turns into a bug instead.

Ryan used this to show what a healthy culture looks like. Psychological safety means it is safe to speak up, even when the question sounds naive. A learning mindset values understanding over agreement. User obsession forces teams to interpret metrics through what people actually feel and want.

He tied it back to leadership. You cannot scale curiosity without leaders who reward it. But you also do not need a title to act like a leader. Leadership is a choice, not a position.

Optimisation Is the Tool, Innovation Is the Outcome

The sharpest risk in A B testing is the local maxima trap. You can keep making small improvements until you reach a point where small changes no longer yield meaningful gains. At that point, you can spend months tuning sesame seeds on the bun and never realise the real opportunity is breakfast.

Ryan told the McDonald’s story through Herb Peterson, a franchisee who created the Egg McMuffin by testing an idea that violated the corporate playbook. No breakfast hours. No eggs. No supply chain. It was risky. It also cracked the market open. Breakfast eventually became a major revenue driver and reshaped customer behaviour.

The lesson was not “be McDonald’s.” It was that optimisation only gets you so far. Sometimes the right move is not another tweak; it is a reframing of the problem.

Ryan mapped this to the Kano model. Basic attributes are table stakes. Performance features improve satisfaction. Exciters create delight, often by solving problems customers cannot clearly ask for yet. Testing tends to cluster around basics and performance because those are measurable and safe. But the real breakthroughs often live in exciters, where teams need to take bigger swings and ask what else is possible.

The Flywheel: Incremental Testing Can Lead to Disruption

Ryan made a case for a healthier relationship between testing and innovation. Incremental tests are not the enemy. When used well, they become a flywheel.

He told the story of Burbn, a cluttered app that noticed users loved the photo feature, leaned into it, dropped most of the product, and became Instagram. That arc started with observation and incremental learning, then made a decisive leap.

He also told Priceline’s own evolution. It began with a disruptive model: Name Your Own Price. Over time, the company learned that customers often wanted speed and certainty more than bidding. That insight helped drive new deal models like Express Deals, then Price Breakers, each opening fresh surfaces to test: messaging, placement, transparency, personalisation, and trust.

Disruption feeds testing. Testing feeds the next disruption. That is the loop.

Learn Before You Build the Whole Thing

Ryan shared a practical pattern for shipping bold ideas without betting the company on a hunch. Get to the first test as quickly as possible. Treat it like an MVP. You do not need perfect designs, months of research, or a fully built programme to learn.

At Priceline, before launching a loyalty programme, the team tested “VIP” messaging on existing savings claims customers already received. The market responded. People posted about being “VIP” and feeling they were part of something. That signal justified deeper investment. Testing did not replace product strategy; it reduced the risk of committing too late.

He also tackled a common question: When do you talk to customers if you test constantly? The answer was simple. Always. Before, during, and after. Testing without customer understanding becomes optimisation theatre.

Normalise Failure or You Will Stop Innovating

Ryan closed with a reminder that is easy to say and hard to live. Most tests fail. He cited a typical win rate of around 20 percent, meaning 70 to 80 percent of tests will be losers. Those failures are not a waste. They are the path to the wins.

He even pointed out a cultural gap many teams have. They celebrate wins in a channel, but they do not share learnings from losses. Yet without losses, there would be no wins to celebrate.

The point was not to romanticise failure. It was to model it openly, learn from it quickly, and keep the organisation brave enough to take the next swing.

What To Take Back to Your Team

Ryan’s talk landed because it did not treat A B testing as a villain or a saviour. It treated it like power. Used well, it helps you learn fast, reduce risk change, and build confidence in decisions. Used poorly, it turns teams into anxious optimisers who mistake certainty for progress.

His challenge was to beware of your strengths, because every strength has a shadow. Optimisation is a tool, not the goal. Curiosity is the unlock. Talk to customers continuously. Collaborate across disciplines so you do not bet on the wrong ideas. And when you feel trapped in small tweaks, make the breakfast sandwich.

Want to watch the full talk?

You can find it here on UXDX: https://uxdx.com/session/the-superpowers-and-shadows-of-ab-testing-balancing-data-driven-success-with-bold-innovation1/

Or explore all the insights in the UXDX USA 2025 Post Show Report: https://uxdx.com/post-show-report