You probably do not need scalability

As I prepare to revamp the job descriptions for the positions I am hiring for at SeatGeek, I tend to browse through similar openings in companies I admire. This morning I found something on Stripe’s job pages that made me smile:

The highlighted snippet reminds me of when I left SoundCloud to joined DigitalOcean back 2015.

I was doing an office tour as part of my onboarding. We eventually made it to the engineering floor, where I would sit. Talking to some folks, I saw a pretty Grafana dashboard on a big screen attached to the wall. It was a typical “API Requests” graph, showing how many requests were made to our API, how many failed, how long it took to process them, etc.

At some point, I asked:

— What is the unit of measure for these graphs?

— Oh, number of requests—basically queries per second

— Oh, sure, but what order of magnitude?

— Uh?

— Is it in thousands, millions…?

— It’s just the number of queries… I am not sure I understand the question?

I was confused and worried.

I thought that my biggest asset as an engineering leader was the experience and skill I acquired while leading a tiny (relative to our competitors) engineering team build a platform that served about 300 million users. Now I was staring at this dashboard that showed me that my new company had less than 1% of the usual traffic I was used to dealing with. Was this a downgrade for my career? Do we even need these many engineers to take care of such a small footprint?

I had just moved countries for this job. Did I make a horrible mistake?

The “-ilities”

Like most things, building software is working with trade-offs. The most common manifestation of this balancing act is when a delivery team needs to prioritize what features of their product are must haves versus those who would be great but are ultimately just nice to have.

There is a completely separate dimension of such trade-offs, usually called non-functional requirements. I love the way Martin Kleppmann talks about these in his excellent book Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems:

An application has to meet various requirements in order to be useful. There are functional requirements (what it should do, such as allowing data to be stored, retrieved, searched, and processed in various ways), and some nonfunctional requirements (general properties like security, reliability, compliance, scalability, compatibility, and maintainability). In this chapter we discussed reliability, scalability, and maintainability in detail.

Reliability means making systems work correctly, even when faults occur. Faults can be in hardware (typically random and uncorrelated), software (bugs are typically systematic and hard to deal with), and humans (who inevitably make mistakes from time to time). Fault-tolerance techniques can hide certain types of faults from the end user.

Scalability means having strategies for keeping performance good, even when load increases. In order to discuss scalability, we first need ways of describing load and performance quantitatively. We briefly looked at Twitter’s home timelines as an example of describing load, and response time percentiles as a way of measuring performance. In a scalable system, you can add processing capacity in order to remain reliable under high load.

Maintainability has many facets, but in essence it’s about making life better for the engineering and operations teams who need to work with the system. Good abstractions can help reduce complexity and make the system easier to modify and adapt for new use cases. Good operability means having good visibility into the system’s health, and having effective ways of managing it.

Just like we do with functional requirements, every time we are designing a system we need to prioritize the non-functional requirements. A company usually has many systems and the trade-off and prioritization varies across those. Great engineering teams make sure that any decision around these non-functional requirements follow the organization’s business strategy.

I can classify the companies I worked for in three larger buckets with regards to their strategy. Let’s look at each bucket and see some examples of how it influenced our non-functional needs.

When you are supporting the business

enterprise Maintenability and Correctness

Thoughtworks - maintainability oo design test - rails packages a lot of it from banks to mmorpgs youa re not part of the product team, thats why Product Owners exist

When you are selling eyeballs

eventual consistency with client side tricks Thoughput and Replaceability We followed the typical playbook of startups of the late 2000s: use venture capital to offer a heavily subsided product to attract millions of users then find a way to monetize these users. soundcloud scale hexagonal archtiecture border matter more, finagle can scale, can be esily replaced nothing is more important than playback some transactions are valuable

When you are you are the business

most transactions are valuable no interest in keeping the person online (meetup) productivity == besat (soundclod API ads killed automation tools DO made apifirst class terfacfrorm ecosystem) ecommerce or digital services ledger Reliability and Productivity digitalocean reliability remove droplet circular dependency, separate control plane

coimpany grows to have all of these eventually

Stratgeu

Let me first make it clear that I left both companies a long time ago and I have no idea about the state of the business in either company after I left. As of today I also do not hold any equity, options or otherwise, in either of them. Everything I bring up here is based on information published by the press.

SeatGeek: Is it really one or the other?All of the above

SeatrGeek was (2) (3) seatgeek all of the above?

Ok, now that disclaimers have been disclaimed, it is no secret that for many years SoundCloud struggled with making money. Meanwhile, DigitalOcean was close to $200 million in yearly revenue and plenty of cash in the bank.

DigitalOcean’s usage was many indeed orders of magnitude smaller than SoundCloud’s when measured by something like queries per second. Still, the dollar value of each transaction on DigitalOcean was also many orders of magnitue higher than SoundCloud’s. This doesn’t mean that one was harder than the other, but that they had different needs with regards to scalability and resilience.

Scaling for productivity microservices lambda

Scaling throughput

Scaling for reliability

asd

Ok, now that disclaimers have been disclaimed, I can say that it is no secret that back then it was a known fact that SoundCloud struggled with money. Meanwhile, DigitalOcean was close to $200 million in yearly revenue and plenty of cash in the bank for capital expenditure initiatives.