How to write a Repository
Out of the supporting Patterns catalogued by Eric Evans, the Repository pattern is probably the most popular. This is probably due to how data persistence has been a hot topic in software development for a long time.
The main problem is that the most popular approach for software development, Object-Orientation, isn’t easily mapped to the format used by efficient external storage systems, like relational or even NoSQL databases. This is often called impedance mismatch, and a whole cottage industry has sprung up around tools and techniques to deal with it.
For Enterprise systems—those performing business-critical tasks but usually not at a high scale—we often address these challenges by using object mapping tools such as Hibernate. These tools usually do an excellent job in dealing with the nuts and bolts of converting data types and formats, but the developer is left with the task of how to best model the concept of persisting and retrieving objects in their Domain Model and the system’s Ubiquitous Language.
For many years, the usual way to go about persistence in the Domain Model was to use specialised objects, such as DAOs and other types of Data Mappers, to convert between objects from your domain model and their persistent equivalent (e.g. to rows in database tables). These design patterns solve several of the hard design problems around coupling and cohesion, but they belong to the system’s infrastructure layer, and as such integrating them with your Ubiquitous Language isn’t a trivial task.
The Repository pattern, as catalogued by Eric Evans and Martin Fowler, offers a good way to integrate persistence needs and the Ubiquitous Language. In his book, Domain-Driven Design, Evans defines the Repository pattern as a “mechanism for encapsulating storage, retrieval, and search behaviour which emulates a collection of objects”. These emulated collections are easily assimilated by the Ubiquitous Language and are simple for engineers to implement and domain experts to understand.
Naming
The concept of a Repository as just a list of objects sounds simple enough to understand, but somehow it is very common for the classes we write to model these to end up with methods that are not related to lists at all. Over several gigs coaching teams adopting Domain-Driven Design, I’ve seen over and over classes that start as Repositories and end up becoming weird versions of DAOs.
To me, one of the best ways to avoid this problem is to name your classes in a way that makes it easy to identify when a method is out of place. Years ago, Rodrigo Yoshima told me about a very interesting way to name Repositories. Instead of the usual naming style displayed below:
He likes to model his classes as following:
The above might look like a small change, but in my experience, it is extremely helpful in keeping Repositories sane.
As an example, let’s look at two different implementations of a Repository. Both contain a member method that I would consider to be out of place-the method should be moved to another class. If we agree that this method shouldn’t be part of this Repository, in which of the two implementations do you think it is easier to spot the problem?
To me, using a more precise vocabulary when naming objects and operations makes it much easier to spot incongruences like the one above. As a corollary, using very generic names or prefixes like get or retrieve makes it much harder to spot these bad smells in our models.
Avoiding Method Explosion
More than just using constrained vocabulary, a well-defined Repository should expose Domain Model concepts as its published interface.
As an example, let’s assume that we have a business rule that says that every order placed on a weekend has 10% surcharge applied to it. Now if we want to retrieve all orders that fall into this situation, we could do something like this:
This may be good enough for a one-off case, but we’re leaking abstractions here, and they might come back to haunt us as the system grows. If you need to deal with surcharged orders in many different places, you will likely start duplicating of the snippet above across your system. If a surcharge is an important concept in your Domain Model, you should make sure that the objects implementing this model—and the Repository is one of those objects—model this term as a first-class citizen. This is a core idea in Domain-Driven design: make implicit domain concepts explicit, usually by modelling them as objects and methods.
I believe that something like this will be better:
The approach described above bring its own challenges. Assuming that surcharged is just one of many states that an order can be in, following this pattern can lead to method explosion as we will end up with a method per state. This won’t be a problem if the number of states is reasonable, but the approach itself doesn’t scale very well for attributes with a large number of possible states.
In their work, Evans and Fowler suggest a way to deal with this problem: the Specification pattern. Evans describes the Specification as “a predicate [object] that determines if an object does or does not satisfy some criteria”. To avoid method explosion in your Repository, you might want to add a method to it that takes in a Specification object and returns objects that fulfil the predicate. Here is an example of how this could work:
But there is another strategy that I quite like: using multiple Repositories. In our ordering example, there is no reason we can have two Repositories, one for AllOrders and another one that deals only with SurchargedOrders.
One way to implement this is to parameterise the Repository when you instantiate it. Something like this:
And use it like this:
We could also implement each variation as a subclass of a base class Orders. When following this design, it is important to make sure we are not replacing method explosion with class explosion.
One Type Only
Another common problem with Repositories happens when they start looking more like a generic “database” object instead of a cohesive collection. As an example, in a system I’ve worked on we had in our Domain Model the following Repository:
After several iterations, we started playing user stories around other objects. The first time one of us had to return an object other than a Service from the database, we thought “Oh, I can’t be bothered creating a class just for this. I’ll just add a method here just for now…“. As it happens sometimes, YAGNI took the wrong turn somewhere, and after a few more weeks our Repository looked like this:
Something interesting here is that our methods still tried to follow Yoshima’s naming conventions as close as possible. Surely they read weird when you think about the type’s name, but something I’ve learnt is that nothing will stop an engineer when they can’t be bothered refactoring…
A good way to build intuition around this is to classify as a design smell when a Repository’s methods return more than one type. It is probably ok to return Fundamental types like integers, strings, and booleans, but if your Repository returns more than one type of domain object you may be better off splitting it into discrete collections:
Not Only Persistence
The main benefit of Repositories is to make explicit where objects come from and make this place part of the Ubiquitous Language. While Repositories are often used to model object persistence to databases and such, this isn’t the only place where they can be useful. Repositories can be used to implement transient collections, they can be a great way to return ValueObjects, and even encapsulate client code used to invoke operations from a remote service.
Revision History
- 23/12/2010 - Original Publication in http://fragmental.tw
- 24/08/2015 - Migrated to http://philcalcado.com
- 25/11/2016 - Recovered and re-published with updates