Improving Website Performance by 11x

Does web performance really matter these days?

As far as I am concerned, an ideal website turns away as few users as possible, regardless of their browser, device, disabilities, or network. As you might imagine, there was culture shock when I was hired to work on a giant React single page app. I frequently disagreed with co-workers. One of those times is when Tailwind CSS slowed our desktop time to first paint by half a second. I complained that kind of impact meant we shouldn't use it. I was countered with a fair question.

"Half a second sounded bad, but what did it actually mean to us?"

I didn't know.

So a co-worker and I founded a performance team to find out. We'd seen those stats about how faster sites improve business metrics and we wanted those numbers but for our site.

Our Kroger.com estimate was you lost $40,000 of yearly revenue per millisecond of load time. Then the pandemic surged our usage since buying groceries in public was suddenly a bad idea. Whatever this number is today, it's probably way higher.

Time doesn't have a linear relationship with dollars. The closer wait time gets to zero, the more users change behaviour. Unshackling speed for users not only changes their behaviour, but it opens your site up to new worlds. Minor performance improvements pay for themselves, but meaningful speedups make your site become orders of magnitude more useful by finding new users that do things you can't anticipate. For example, YouTube once shipped a 90% smaller watch page, but their average performance metrics tanked because users with terrible connections could finally start watching videos.

The math says improving performance is one of the surest returns on investment a website can get. Speed is the king feature because it improves all your other features. No feature exists until it finishes loading.

What is causing sites to be slow?

For such a sure investment, we're so bad at making fast websites. We keep getting slower out of proportion to most users networks and devices. We set new devs up to fail with React by default when keeping React fast is not a beginner topic. These problems don't have simple answers because if they did we'd have solved our performance problems by now.

Kroger is the biggest grocery chain you've never heard of because there is a ton of sub brands. But it's 17 on the US fortune 500, so Kroger.com was in the bigger leagues of web dev where nobody shuts up about scale. It was a React single page app that was really slow, and still is according to PageSpeed Insights. To be fair to my former employer, Walmart's and Target sites are also React and also really slow.

But it’s not all just React. To make a meaningful difference in performance, you're going to need to optimise many things. Cramming a V12 in a minivan won't make it race-worthy, and swapping a site's JavaScript framework won't magically make it fast. You need to scrutinise everything from kilobytes hitting the network to pixels hitting the glass. The core idea is simple. Don't make humans wait for websites. That's truly all web performance is. When you try quantifying that, it all goes to hell.  

Given this evidence was it easy to start prioritising performance improvements?

At Kroger we were having a really hard time with incremental performance improvements. People said they cared, but the performance tickets always got out prioritised, speedups got hoarded in case they needed to ram a feature through the bundle check, and even the smallest developer experience always seemed to outweigh what it cost our users. Maybe proving that speed equaled money wasn't enough.

We also had to convince people emotionally to show everyone how much better our site could be if it were fast. So in a fit of bad judgment, I vowed to make the fastest possible version of Kroger.com.

Where did you start on your improvements?

I found the post Real-world Performance Budgets from Google's former Chief of Performance which advocated for picking a target device and network, then finding out how much data they can handle in five seconds. For maximum relevance, I chose Kroger's best-selling phone, the Poblano Hot Pepper. Its specs might have been good once, not so much now. My target network was cellular data that has filtered through our metal buildings. Walking around with a network analyzer I discovered that the network resembled Web Page Test's slow 3G preset. The Poblano and the target network specs resulted in a budget of about 150 kilobytes, which was not bad. The problem though was that Kroger.com's third-party JavaScript totaled 367 kilobytes.

My boss told me in no uncertain terms which scripts I couldn't get rid of, so after further bargaining, workarounds, and compromises, my front-end code had to fit into 20 kilobytes, which is less than half the size of React. But Preact is famously small. Why not try that? A quick estimate seemed promising. The Preact ecosystem had a client-side router and a state manager integration I could use, leaving about five kilobytes left over. But is this all the JavaScript a single page application (SPA) needs? I knew I would eventually need more code. Translating UI to JSON and back again. Something like React Helmet, but not the Preact Helmet package because that one is four kilobytes for some reason. Something like the Webpack module runtime, but hopefully not actually the Webpack module runtime. Reimplementing browser features within the page navigation lifecycle. And if any of you have done analytics for SPA’s, you know they don't work out of the box.

I don't have size estimates for these because I had already abandoned the single-page app approach. I didn't want a toy site that was fast only because it ignored a real site's responsibilities. I see those responsibilities of grocery e-commerce as security even over access, access even over speed, and speed even over slickness. I refused to compromise on security or accessibility. I didn't want any speed-ups that conflicted with them.

And the first conflict was security. It's not fundamentally different between multi-page apps and single-page apps. Both need to protect against known exploits. For multi-page apps, almost all that code lives on the server. But for single-page apps, for example, anti-cross-site request forgery attaches authenticity tokens to HTTP requests. In multi-page apps, that means a hidden input and form submissions. But single-page apps need additional JavaScript for client-side details. Not much, but repeat for authentication, escaping, session revocation, and other security features.

I had 5 kilobytes leftover so I could spend them on security. Discouraging, but not impossible to work around. Speaking of impossible to work around, unlike security, client-side routing has accessibility problems exclusive to it. First, you must add code to restore the accessibility of built-in page navigation. Again, doable, but it means more JavaScript. Usually a library, but downloaded JavaScript just the same. Worse, some SPA accessibility problems can't be fixed. There's a proposed standard to fix it, so once that's implemented and all assistive software is caught up supporting it, it won't be a problem anymore. But let's say I mitigate all that. Sure, it sounds difficult, but theoretically it can be done by adding client-side JavaScript.

So we're back to my original problem. Beyond the inexorable gravity of client-side JavaScript, single-page apps have other performance downsides. Memory leaks are inevitable, but they rarely matter in multi-page apps. In single-page apps, one team's leak ruins the rest of the session. JavaScript initiated requests have lower network priority than requests from links and forms, which even affects how the operating system prioritises your app over other programs.

Lastly, server code can be measured, scaled, and optimised until you know it's fast enough. But client devices? Devices are unboundedly bad, with decade-old chips and terrible RAM. I figured if I inline CSS and send HTML as fast as possible, there would be no overhead as negligible compared to the network round trip.

It sounds like server side apps solve a lot of the problems?

There was one problem with server side. Like many large companies, Kroger.com's pages were made from multiple data sources, which could each have their own teams, speed, and reliability. If these 10 data sources each take one API call, what are the odds my server can respond quickly? Odds are pretty bad. If 1% of all data responses are slow, then a page with 10 backend sources will be slow at 9.5% at the time. And sessions load more than just one view. If a session has eight pages, then that original 1% chance turns into near certainty for every user, which is even worse than it sounds. A one-time delay is a cooling effect on the rest of the session, even if everything afterward is fast. So I needed to prevent individual data sources from delaying the rest of the page.

I suspect this problem alone may be why so many big sites choose single-page apps, but this cooling effect also argues that maybe a single-page app's slow load up front doesn't really fix the problem.

If both Single Page and Multi-Page are bad for performance what did you do?

We had fast websites from big companies before we had single-page apps. I vaguely remembered early performance pioneers saying browsers can display a page as it was generated. This is a technique Google Search and Amazon have used since the 90s, and it's even more efficient in HTTP 2 and 3, so it's here to stay.

HTML streaming does what it says. It streams data to the browser as it receives it. This lets browsers get a head start on downloading page assets, doesn't block interactivity like hydration does, and doesn't need to block or break when JavaScript does. It's also more efficient for servers to generate. Clearly I wanted HTML streaming, but how do you do it?

Today, popular JS frameworks are buzzing about streaming, but at the time I could only find older platforms like PHP or Rails that mentioned it, none of which were approved technologies at Kroger. Eventually I found an old GitHub repo comparing templating languages which had two streaming candidates. The double-deprecated Dust and Marko.

Disclaimer, I now work on the Marko team, but the work at Kroger predates that. Okay, so Marko could stream. That was a good start. Its client-side component runtime was half my budget, but it was zero JavaScript by default, so I didn't have to use it. It uses HTTP's built-in streaming to send a page in order as the server generates it, but it had even more tricks up its sleeve.

Let's say fetching recommended products is usually fast, but sometimes it hiccups. If you know how much money those recommendations make, you can fine-tune a timeout so that their performance cost never exceeds that revenue. But does the user really have to get nothing if it was unlucky enough to take 51 milliseconds? The client reorder attribute turns it into an HTML fragment that doesn't block the rest of the page and can render out of order. This requires JavaScript, so you can weigh the trade-offs of using it versus a timeout with no fallback. Client reorder is probably a good idea on a product detail page, but you want it to always work if the user used a dedicated recommendations page. It had already sold me, but Marko had another killer feature for my goal, automatic component islanding. Only the components that actually could dynamically re-render on the site would add their JavaScript to the bundle.

Sounds like you had your framework. How did you go about building your new version?

All right, I had my goal, my theory, and a framework design to help me with both. Now I had to write the components, style the design, and build the features. Even with a solid technical foundation, I still had to nail the details. At first I thought I'd copy the existing site's UI, but that UI and its interactions were designed with different priorities, so I had to suck it up and redesign. I'm no designer, but I had a secret weapon. Users think faster sites are better designed and easier to use. My other secret weapon is that I like CSS. A bonus of me doing both is that I could rapidly weigh pros and cons to explore alternatives that better compromise user experience and speed, or even alternatives that improve both. Now my design priorities from before are still held.

This website sells food. I will ruthlessly sacrifice delight for access. For example, the product carousels didn't fit the screen, and because they took up so much real estate, trying to scroll past them would scroll trap me like a bad Google's map embed.

And that meant their dimensions took up so much space that it caused juttering from layout cost and GPU pressure. So I went boring and simple, relying on text, horizontal nature, and some enticing, see more links instead of infinite scrolling.

Then an easy choice, no web fonts. They were 14 kilobytes, I couldn't afford them. Not the right choice for all sites, but remember, groceries. This industry historically prefers effective typography over beautiful typography.

Banning modals and their cousins was both a performance and a user experience improvement. Do you like pop-ups and all their annoying friends? They also make less sense on small screens. In particular, modals take up nearly the entire page anyway, so they might as well be their own page. Lastly, they're hard to make accessible. The code required to do so really adds up. Check out the sizes of some popular modules. The alternatives to those widgets weren't as flashy and were sometimes harder to design, but hey, great design is all about constraints, right? That's how I justified it.

This led to a nice payoff. Fast page loads can lead you design better. Our existing checkout flow had expanding accordions, intertwined error constraints, and tricky focus management because of the first two. To avoid all that, I broke checkout into a series of small quick pages. It didn't take long to code and it was surprisingly easy, and the UX results were even better. With paint holding, a full page navigation doesn't have to feel heavyweight. I could do all this easily because I was simultaneously the designer and the developer. An intuitive understanding of what's easy versus what's hard in web code can go a long way.

All of this resulted in a time from first load to checkout that was 11 times faster than the existing frontend for the whole journey, not just initial load. My demo is in a prod bucket in front of real APIs through Akamai over the real internet, and it was also responsive. The 11x improvement is actually one of the flattering recordings of the existing site. I had a lot of takes where I interacted too quickly and broke the interface before it was ready.

That is amazing. So what happened when you shipped?

Even if we only managed half the speed improvement, our $40,000 per millisecond figure from before would estimate this kind of speed would equal another $40 million of yearly revenue. And that is assuming an 11 times speed-up wouldn't change user behavior, and you know it would.

But my frontend never shipped.

Truthfully, I don't know why. None of our proposals got specific rejections. I just didn’t find a process that worked to get the buy in from all of the different stakeholders required to do such a large change.

Could my demo have kept this speed in the real world? I think so. It hadn't yet withstood ongoing feature development, but far bigger and more complex software successfully uses regression tracking to withstand that, such as web browsers themselves. Later features that could live on other pages wouldn't slow down the one seen here thanks to the nature of multi-page apps.

Based on your experience what would you recommend to others?

There's only three rules..

  1. Measure what users care about, not single metrics
  2. Tools should help you, not limit you
  3. Remember, this is for everyone

Metrics such as time to first byte and first meaningful paint are for diagnosis, not meaningful payoffs. 20% sooner, time to interactive is just maintenance. If you want the speed I showed under the circumstances I targeted, I can guarantee Marko's rollup integration has it. But your needs may differ. Don't pick the technology until it meets your goals on user hardware.

It's important to verify you can serve your customers by checking your assumptions on the devices they can use. Tools like Google Lighthouse are wonderful and you should use them, but if your choices aren't also informed by hardware that represents your users, you're just posturing.

If you can avoid third parties, then perfect, do that. It would have made my life 130 kilobytes easier, but absolutism won't help when businesses insist. If we don't make third-party scripts our problem, they become the user's problem. It's tempting to view third-party scripts as not your department or as a necessary evil you can't fight, but you have to fight to eke out even acceptable performance, as evidenced by me devoting 87% of my budget to code that did nothing for users. Write down what all third parties are like on your site and for how long, and like all accounting, track their actual payouts, not just the projected estimate that they approach you with. This can be as fancy as automated alerts or as simple as a spreadsheet. Even a little bookkeeping can avoid money sinks.

I wish I had more useful advice and processes that produce web performance, but that would be disingenuous. After all, my thing never shipped. Even if I knew, there's a more important question. If the technology is here and users are out there, what's stopping you?