Microservices and the First Law of Distributed Objects

Note: This is an essay I wrote as the basis for a talk I’ve given for GOTO Chicago in February 2017. You can find the original slides here or watch the video below:

GOTO Nights & Devops Chicago - 2017-02-27 - Phil Calçado on Microservices vs The First Law of Object Design from Spantree Technology Group, LLC on Vimeo.

In the late 90s and early 2000s, this industry was obsessed with what is usually called “distributed objects”. I believe that this came to be when the cost of acquiring a second server became less than a second senior programmer. In what’s maybe a weird variant of Conway’s Law, we decided that, instead of trying to write more efficient software to run on a single server, we would rather divide the software across many machines—remember, this is before multicores was a thing and back when Linux only had userland threads.

This was the golden age of middleware vendors, so surely soon each vendor decided to push their own solution for the problem. Over the years we were introduced to CORBA, DCOM, EJBs, RMI, and several others. These frameworks and platforms all offered some flavour of “transparent” distribution, often by generating boilerplate code based on metadata.

The problem is that this transparency was never real. To illustrate some of the problems with this claim, let me use an example from the meatspace.

I used to live in Australia, and once ThoughtWortks allocated me in a project that would be developed between a big bank’s Sydney and London offices. As you might know from their published material, ThoughtWorks really values high-bandwidth communication and co-location for team members—we would always prefer a two minutes conversation over a Jira ticket. The problem is that the client’s contact management system, I believe it was Microsoft Exchange, wasn’t configured to let us know in which office a person was based. This led the Australian team to randomly call London team members at 3 am their time, and vice-versa. After a while, we established a protocol: we’d have one scheduled call a week, and any ad-hoc calls required confirmation by email with 24 hours notice.

Now this is usually the moment when engineers start yelling “remote first!111one!”, but bear with me for a second. Firstly, this was 2007 and technology wasn’t as friendly towards remote collaboration as it is now—no Google docs or Hangouts, not many people had fancy phones. But more importantly, there was a more fundamental problem, one that is the same we experience with distributed objects.

You see, object orientation is about splitting a problem into very tiny pieces. These pieces are so small that the only way in that they can accomplish anything is by passing messages between each other. That’s not too different from a team of software engineers, product owners, and others trying to build a system. The challenges faced in the example above are very much present in any distributed object system. Putting a continent between two people is similar to putting a network between two objects: organic communication isn’t cheap and you will often have to follow a protocol with various layers of error correction and pay the overhead price.

After various attempts by the industry at solving this problem via innovations in the underlying technology and hardware, at some point the practitioners decided that they had it with all this mess. An iconic artefact of these times is Martin Fowler’s First Law of Distributed Objects: Don’t.

Fast forward ten years and here we are, getting ready for another GOTO Chicago and the conference schedule has 16 occurrences of the word microservice. Microservice is yet another one of those terms in software engineering that don’t have a precise definition but is still useful. If I say to you “DigitalOcean is migrating from a monolithic to a microservices architecture”, you might not know what my actual definition of microservice is, but you can imagine that the work we are doing there is similar to what other folks like SoundCloud, Netflix, and various others have reported on. The term creates some basic shared understanding, and that in itself already makes it useful.

And this basic shared understanding of what is a microservice have us collectively know that these things are meant to be small pieces of software that collaborate to achieve a goal. That sounds familiar as we just discussed a similar definition for objects, but here’s the catch: microservices are meant to be distributed over a network of sorts.

It is very easy to be cynical and shrug the whole thing as yet another instance of everything old is new again, but let’s assume for the sake of this text that there are actual benefits to a microservices architecture—if you want to know more about what these would be I suggest you attend some of the 16 talks about it at the conference next month.

But even if we agree that microservices are desirable, how is it ok for them to break Martin’s First Law without ceremony and be applauded by us? I would argue that there are two main factors that lead us to this conclusion.

On the technology side, twenty years of improvements in hardware and software made it such that the cost/benefit of calls between different servers is much cheaper. The largest benefits come from two decades of improvements in computer architecture, networks, and basic software—Moore’s trinity when it comes to any distributed system. But by giving up on a lot of the magic promised by vendors, we practitioners were able to create more efficient and faster cross-language serialisation formats and RPC protocols.

These improvements don’t make communication between two distributed pieces of software (objects, services, applications, or whatever unit of composition you fancy) free of cost, but when you take into consideration the various other costs and complexities around software engineering—from the ever-decreasing cost of hardware to how cloud computing works to how finding and hiring senior engineers now is even harder than before—the return on investment is very attractive to a certain class of organisations.

But don’t get me wrong, I am not saying that those improvements are enough for us to go back to distributed objects. In fact, I would argue that Martin’s Law should be not about the performance or even coding overhead of distribution, but about how objects are not a good unit of distribution.

Objects are too small and chatty to be good network citizens, but this tends to stem from the fact that a well-modelled object won’t be able to do almost anything on its own, it will always have some strong dependency on its neighbouring objects. Every time you need to make even a small change or refactoring in your system you will need to touch multiple different objects, and if these objects are distributed you then need to manage the complexities of rolling out a widely distributed system every time you want to change the name of a class or method.

Services, on the other hand, are meant to be more coarse-grained. This should help us avoid falling into the distributed object traps, but unfortunately, our confusing-but-useful term microservices introduces a new challenge with the prefix micro.

As an industry, we’ve been practising flavours of Service-Oriented Architecture for several decades now. As we mentioned before, this flavour generally known as microservices doesn’t haver a widely accepted definition, but there is a consensus that they should be small. The reader won’t be surprised to find out that the term small also lacks proper definition. Whatever it means in practice is left as an exercise to be performed by each team or individual.

In my experience, teams always struggle in trying to find where in the spectrum between a monolith and an object their services should live. Maybe we could use a simplification of Martin’s Law as a rule of thumb to test our services’ “size”?

We know we don’t want to have a monolith, a single or few systems that package everything from the system in the same software, and we also know that we don’t want to have distributed objects. What are other units of abstraction that we can use for services?

If you’ve played microservices buzzword bingo before, you know that this is probably the point when I would mention Bounded Contexts. I will not disappoint you, dear reader, but before arriving at a conclusion I would like to explore a little bit why this tool can help us here.

Eric Evan’s popular book Domain-Driven Design describes a very interesting pattern language for software. As I understand it, the fundamental rock of his work is a language, the Ubiquitous Language, that is derived from the system’s model. Eric is adamant in that the team should share a single language.

This might sound familiar to those of us used to enterprise architects and their utopic unified corporative data model projects. This isn’t what Eric proposes, though. He acknowledges that a single organisation will have multiple models and subsequently languages, sometimes with conflicting terms to describe the same concept. To manage this situation, he suggests that we explicitly define the boundaries between these models, to “keep the model strictly consistent within these bounds, but don’t be distracted or confused by issues outside [it]”.

So maybe one way to go about defining what is a good-enough service is starting with the language you are modelling. From there, you should cluster related terms, and try to identify places where different terms are used to model similar concepts. You should then be able to identify some stronger and weaker relationships between clusters and terms—maybe the relationship between user, group, and login is stronger than the relationship between user and warehouse.

Once you identify the different relationships, you should be able to draw a boundary between them, effectively discovering your Bounded Contexts. Importantly, you should not try to normalise the vocabulary across all Bounded Contexts, instead, you should make any necessary translation between them intentional, explicit, and testable.

One of the reasons that I like this approach is because, in my experience, these strong relationships correlate with very chatty communication between the objects that implement the concepts. If you are able to identify and isolate these clusters not only you will be able to create a consistent and robust model, but you are also given an actionable blueprint on potential services to create or extract from a monolith.

As mentioned before, I find that this approach and general philosophy works very well for some class of organisations, especially those who started as a monolith and now need to reduce time-to-market by extracting services from the big blob of legacy code. There are many occasions where this approach isn’t adequate or breaks down, though.

When you are still a small organisation or startup, all this complexity probably doesn’t make sense. Just keep focused on validating your business and product. Use whatever tools and architectures will help you validate it as soon as possible. Once you have properly sustained growth you can try to apply this to refactor the existing systems.

For larger, “web-scale” organisations, the model can also be challenging. At some level of scale and growth, the return on investment mentioned before doesn’t apply anymore. At this stage, very often you need to shape your internal services, not after maintainability or approachability, but rather performance, cost, and reliability requirements. These tend to be more extreme than any rule of thumb as the one proposed here can help with.

For companies right in the middle of their journeys—not too little to afford not thinking about it and not too big to have extreme requirements—the approach above has served me well through several hyper-growth phases. Until it breaks down and we go back to the drawing board.