Does web performance really matter these days?
As far as I am concerned, an ideal website turns away as few users as possible, regardless of their browser, device, disabilities, or network. As you might imagine, there was culture shock when I was hired to work on a giant React single page app. I frequently disagreed with co-workers. One of those times is when Tailwind CSS slowed our desktop time to first paint by half a second. I complained that kind of impact meant we shouldn't use it. I was countered with a fair question. Half a second sounded bad, but what did it actually mean to us? I didn't know. So a co-worker and I founded a performance team to find out. We'd seen those stats about how faster sites improve business metrics and we wanted those numbers but for our site.
Our Kroger.com estimate was you lost $40,000 of yearly revenue per millisecond of load time. Then the pandemic surged our usage since buying groceries in public was suddenly a bad idea. Whatever this number is today, it's probably way higher.
Time doesn't have a linear relationship with dollars. The closer wait time gets to zero, the more users change behaviour. Unshackling speed for users not only changes their behaviour, but it opens your site up to new worlds. Minor performance improvements pay for themselves, but meaningful speedups make your site become orders of magnitude more useful by finding new users that do things you can't anticipate. For example, YouTube once shipped a 90% smaller watch page, but their average performance metrics tanked because users with terrible connections could finally start watching videos.
The math says improving performance is one of the surest returns on investment a website can get. Speed is the king feature because it improves all your other features. No feature exists until it finishes loading.
What is causing sites to be slow?
For such a sure investment, we're so bad at making fast websites. We keep getting slower out of proportion to most users networks and devices. We set new devs up to fail with React by default when keeping React fast is not a beginner topic. These problems don't have simple answers because if they did we'd have solved our performance problems by now.
Kroger is the biggest grocery chain you've never heard of because there is a ton of sub brands. But it's 17 on the US fortune 500, so Kroger.com was in the bigger leagues of web dev where nobody shuts up about scale. It was a React single page app that was really slow, and still is according to PageSpeed Insights. To be fair to my former employer, Walmart's and Target sites are also React and also really slow.
Given this evidence was it easy to start prioritising performance improvements?
At Kroger we were having a really hard time with incremental performance improvements. People said they cared, but the performance tickets always got out prioritised, speedups got hoarded in case they needed to ram a feature through the bundle check, and even the smallest developer experience always seemed to outweigh what it cost our users. Maybe proving that speed equaled money wasn't enough.
We also had to convince people emotionally to show everyone how much better our site could be if it were fast. So in a fit of bad judgment, I vowed to make the fastest possible version of Kroger.com.
Where did you start on your improvements?
I don't have size estimates for these because I had already abandoned the single-page app approach. I didn't want a toy site that was fast only because it ignored a real site's responsibilities. I see those responsibilities of grocery e-commerce as security even over access, access even over speed, and speed even over slickness. I refused to compromise on security or accessibility. I didn't want any speed-ups that conflicted with them.
Lastly, server code can be measured, scaled, and optimised until you know it's fast enough. But client devices? Devices are unboundedly bad, with decade-old chips and terrible RAM. I figured if I inline CSS and send HTML as fast as possible, there would be no overhead as negligible compared to the network round trip.
It sounds like server side apps solve a lot of the problems?
There was one problem with server side. Like many large companies, Kroger.com's pages were made from multiple data sources, which could each have their own teams, speed, and reliability. If these 10 data sources each take one API call, what are the odds my server can respond quickly? Odds are pretty bad. If 1% of all data responses are slow, then a page with 10 backend sources will be slow at 9.5% at the time. And sessions load more than just one view. If a session has eight pages, then that original 1% chance turns into near certainty for every user, which is even worse than it sounds. A one-time delay is a cooling effect on the rest of the session, even if everything afterward is fast. So I needed to prevent individual data sources from delaying the rest of the page.
I suspect this problem alone may be why so many big sites choose single-page apps, but this cooling effect also argues that maybe a single-page app's slow load up front doesn't really fix the problem.
If both Single Page and Multi-Page are bad for performance what did you do?
We had fast websites from big companies before we had single-page apps. I vaguely remembered early performance pioneers saying browsers can display a page as it was generated. This is a technique Google Search and Amazon have used since the 90s, and it's even more efficient in HTTP 2 and 3, so it's here to stay.
Today, popular JS frameworks are buzzing about streaming, but at the time I could only find older platforms like PHP or Rails that mentioned it, none of which were approved technologies at Kroger. Eventually I found an old GitHub repo comparing templating languages which had two streaming candidates. The double-deprecated Dust and Marko.
Sounds like you had your framework. How did you go about building your new version?
All right, I had my goal, my theory, and a framework design to help me with both. Now I had to write the components, style the design, and build the features. Even with a solid technical foundation, I still had to nail the details. At first I thought I'd copy the existing site's UI, but that UI and its interactions were designed with different priorities, so I had to suck it up and redesign. I'm no designer, but I had a secret weapon. Users think faster sites are better designed and easier to use. My other secret weapon is that I like CSS. A bonus of me doing both is that I could rapidly weigh pros and cons to explore alternatives that better compromise user experience and speed, or even alternatives that improve both. Now my design priorities from before are still held.
This website sells food. I will ruthlessly sacrifice delight for access. For example, the product carousels didn't fit the screen, and because they took up so much real estate, trying to scroll past them would scroll trap me like a bad Google's map embed.
And that meant their dimensions took up so much space that it caused juttering from layout cost and GPU pressure. So I went boring and simple, relying on text, horizontal nature, and some enticing, see more links instead of infinite scrolling.
Then an easy choice, no web fonts. They were 14 kilobytes, I couldn't afford them. Not the right choice for all sites, but remember, groceries. This industry historically prefers effective typography over beautiful typography.
Banning modals and their cousins was both a performance and a user experience improvement. Do you like pop-ups and all their annoying friends? They also make less sense on small screens. In particular, modals take up nearly the entire page anyway, so they might as well be their own page. Lastly, they're hard to make accessible. The code required to do so really adds up. Check out the sizes of some popular modules. The alternatives to those widgets weren't as flashy and were sometimes harder to design, but hey, great design is all about constraints, right? That's how I justified it.
This led to a nice payoff. Fast page loads can lead you design better. Our existing checkout flow had expanding accordions, intertwined error constraints, and tricky focus management because of the first two. To avoid all that, I broke checkout into a series of small quick pages. It didn't take long to code and it was surprisingly easy, and the UX results were even better. With paint holding, a full page navigation doesn't have to feel heavyweight. I could do all this easily because I was simultaneously the designer and the developer. An intuitive understanding of what's easy versus what's hard in web code can go a long way.
All of this resulted in a time from first load to checkout that was 11 times faster than the existing frontend for the whole journey, not just initial load. My demo is in a prod bucket in front of real APIs through Akamai over the real internet, and it was also responsive. The 11x improvement is actually one of the flattering recordings of the existing site. I had a lot of takes where I interacted too quickly and broke the interface before it was ready.
That is amazing. So what happened when you shipped?
Even if we only managed half the speed improvement, our $40,000 per millisecond figure from before would estimate this kind of speed would equal another $40 million of yearly revenue. And that is assuming an 11 times speed-up wouldn't change user behavior, and you know it would.
But my frontend never shipped.
Truthfully, I don't know why. None of our proposals got specific rejections. I just didn’t find a process that worked to get the buy in from all of the different stakeholders required to do such a large change.
Could my demo have kept this speed in the real world? I think so. It hadn't yet withstood ongoing feature development, but far bigger and more complex software successfully uses regression tracking to withstand that, such as web browsers themselves. Later features that could live on other pages wouldn't slow down the one seen here thanks to the nature of multi-page apps.
Based on your experience what would you recommend to others?
There's only three rules..
- Measure what users care about, not single metrics
- Tools should help you, not limit you
- Remember, this is for everyone
Metrics such as time to first byte and first meaningful paint are for diagnosis, not meaningful payoffs. 20% sooner, time to interactive is just maintenance. If you want the speed I showed under the circumstances I targeted, I can guarantee Marko's rollup integration has it. But your needs may differ. Don't pick the technology until it meets your goals on user hardware.
It's important to verify you can serve your customers by checking your assumptions on the devices they can use. Tools like Google Lighthouse are wonderful and you should use them, but if your choices aren't also informed by hardware that represents your users, you're just posturing.
If you can avoid third parties, then perfect, do that. It would have made my life 130 kilobytes easier, but absolutism won't help when businesses insist. If we don't make third-party scripts our problem, they become the user's problem. It's tempting to view third-party scripts as not your department or as a necessary evil you can't fight, but you have to fight to eke out even acceptable performance, as evidenced by me devoting 87% of my budget to code that did nothing for users. Write down what all third parties are like on your site and for how long, and like all accounting, track their actual payouts, not just the projected estimate that they approach you with. This can be as fancy as automated alerts or as simple as a spreadsheet. Even a little bookkeeping can avoid money sinks.
I wish I had more useful advice and processes that produce web performance, but that would be disingenuous. After all, my thing never shipped. Even if I knew, there's a more important question. If the technology is here and users are out there, what's stopping you?