How to write a Repository

Out of the supporting Patterns catalogued by Eric Evans, the Repository pattern is probably the most popular. This is probably due to how data persistence has been a hot topic in software development for a long time.

The main problem is that the most popular approach for software development, Object-Orientation, isn’t easily mapped to the format used by efficient external storage systems, like relational or even NoSQL databases. This is often called impedance mismatch, and a whole cottage industry has sprung up around tools and techniques to deal with it.

For Enterprise systems—those performing business-critical tasks but usually not at a high scale—we often address these challenges by using object mapping tools such as Hibernate. These tools usually do an excellent job in dealing with the nuts and bolts of converting data types and formats, but the developer is left with the task of how to best model the concept of persisting and retrieving objects in their Domain Model and the system’s Ubiquitous Language.

For many years, the usual way to go about persistence in the Domain Model was to use specialised objects, such as DAOs and other types of Data Mappers, to convert between objects from your domain model and their persistent equivalent (e.g. to rows in database tables). These design patterns solve several of the hard design problems around coupling and cohesion, but they belong to the system’s infrastructure layer, and as such integrating them with your Ubiquitous Language isn’t a trivial task.

The Repository pattern, as catalogued by Eric Evans and Martin Fowler, offers a good way to integrate persistence needs and the Ubiquitous Language. In his book, Domain-Driven Design, Evans defines the Repository pattern as a “mechanism for encapsulating storage, retrieval, and search behaviour which emulates a collection of objects”. These emulated collections are easily assimilated by the Ubiquitous Language and are simple for engineers to implement and domain experts to understand.

Naming

The concept of a Repository as just a list of objects sounds simple enough to understand, but somehow it is very common for the classes we write to model these to end up with methods that are not related to lists at all. Over several gigs coaching teams adopting Domain-Driven Design, I’ve seen over and over classes that start as Repositories and end up becoming weird versions of DAOs.

To me, one of the best ways to avoid this problem is to name your classes in a way that makes it easy to identify when a method is out of place. Years ago, Rodrigo Yoshima told me about a very interesting way to name Repositories. Instead of the usual naming style displayed below:

class OrderRepository {
List<Order> getOrdersFor(Account a){...}
}

He likes to model his classes as following:

class AllOrders {
List<Order> belongingTo(Account a){...}
}

The above might look like a small change, but in my experience, it is extremely helpful in keeping Repositories sane.

As an example, let’s look at two different implementations of a Repository. Both contain a member method that I would consider to be out of place-the method should be moved to another class. If we agree that this method shouldn’t be part of this Repository, in which of the two implementations do you think it is easier to spot the problem?

//classic naming style
class UserRepository{
 User retrieveUserByLogin(String login){...}
 void submitOrder(Order order){...}
}
 
//client code
User u = userRepository.retrieveUserByLogin("pcalcado");
userRepository.submitOrder(new Order());

//Yoshima’s naming style
class AllUsers{
 User withLogin(String login){...}
 void submitOrder(Order order){...}
}
 
//client code
User u = allusers.withLogin("pcalcado");
allusers.submitOrder(new Order());

To me, using a more precise vocabulary when naming objects and operations makes it much easier to spot incongruences like the one above. As a corollary, using very generic names or prefixes like get or retrieve makes it much harder to spot these bad smells in our models.

Avoiding Method Explosion

More than just using constrained vocabulary, a well-defined Repository should expose Domain Model concepts as its published interface.

As an example, let’s assume that we have a business rule that says that every order placed on a weekend has 10% surcharge applied to it. Now if we want to retrieve all orders that fall into this situation, we could do something like this:

List<Order> surchargedOrders = allOrders.placed(user, IN_A_SATURDAY);
surchargedOrders.addAll(allOrders.placed(user, IN_A_SUNDAY));
return surchargedOrders;

This may be good enough for a one-off case, but we’re leaking abstractions here, and they might come back to haunt us as the system grows. If you need to deal with surcharged orders in many different places, you will likely start duplicating of the snippet above across your system. If a surcharge is an important concept in your Domain Model, you should make sure that the objects implementing this model—and the Repository is one of those objects—model this term as a first-class citizen. This is a core idea in Domain-Driven design: make implicit domain concepts explicit, usually by modelling them as objects and methods.

I believe that something like this will be better:

return allOrders.surchargedFor(user);

The approach described above bring its own challenges. Assuming that surcharged is just one of many states that an order can be in, following this pattern can lead to method explosion as we will end up with a method per state. This won’t be a problem if the number of states is reasonable, but the approach itself doesn’t scale very well for attributes with a large number of possible states.

In their work, Evans and Fowler suggest a way to deal with this problem: the Specification pattern. Evans describes the Specification as “a predicate [object] that determines if an object does or does not satisfy some criteria”. To avoid method explosion in your Repository, you might want to add a method to it that takes in a Specification object and returns objects that fulfil the predicate. Here is an example of how this could work:

return allOrders.thatAre(user, OrderSpecifications.SURCHARGED);

But there is another strategy that I quite like: using multiple Repositories. In our ordering example, there is no reason we can have two Repositories, one for AllOrders and another one that deals only with SurchargedOrders.

One way to implement this is to parameterise the Repository when you instantiate it. Something like this:

//a base Repository
class Orders {
 private Orders.Status desiredStatus = null;
 
 public Orders(){
   this(Order.Status.ANY);
 }
 
 public Orders(Order.Status desiredStatus){
   this.desiredStatus = desiredStatus;
 }

  public List<Order> from(User user) {...}
}

And use it like this:

//instantiated somewhere as
Orders allOrders = new Orders();
Orders surchargedOrders = new Orders(Order.Status.SURCHARGED);

//returns all orders
return allOrders.from(user);
 
//returns only orders with applied surcharge
return surchargedOrders.from(user)

We could also implement each variation as a subclass of a base class Orders. When following this design, it is important to make sure we are not replacing method explosion with class explosion.

One Type Only

Another common problem with Repositories happens when they start looking more like a generic “database” object instead of a cohesive collection. As an example, in a system I’ve worked on we had in our Domain Model the following Repository:

public interface AllServices {
 
    List<Service> belongingTo(List<Account> accounts);
 
    Service withNumber(String serviceNumber);
 
    List<Service> relatedTo(Service otherService);
 }

After several iterations, we started playing user stories around other objects. The first time one of us had to return an object other than a Service from the database, we thought “Oh, I can’t be bothered creating a class just for this. I’ll just add a method here just for now…“. As it happens sometimes, YAGNI took the wrong turn somewhere, and after a few more weeks our Repository looked like this:

public interface AllServices {
 
    List<Service> belongingTo(List<Account> accounts);
 
    Service withNumber(String serviceNumber);
 
    List<Service> relatedTo(Service otherService);
 
    List<Product> allActiveProductsBelongingTo(List<Account> accounts);
 
    List<Product> allProductsBelongingTo(List<Account> accounts);
 
    ContractDetails retrieveContractDetails(String serviceNumber);
}

Something interesting here is that our methods still tried to follow Yoshima’s naming conventions as close as possible. Surely they read weird when you think about the type’s name, but something I’ve learnt is that nothing will stop an engineer when they can’t be bothered refactoring…

// mind = blown
AllServices allProducts = new AllServices();

// ...
return allProducts.allActiveProductsBelongingTo(accounts);

A good way to build intuition around this is to classify as a design smell when a Repository’s methods return more than one type. It is probably ok to return Fundamental types like integers, strings, and booleans, but if your Repository returns more than one type of domain object you may be better off splitting it into discrete collections:

public interface AllServices {
 
    List<Service> belongingTo(List<Account> accounts);
 
    Service withNumber(String serviceNumber);
 
    List<Service> relatedTo(Service otherService);
}
 
public interface AllProducts {
 
    List<Product> activeBelongingTo(List<Account> accounts);
 
    List<Product> belongingTo(List<Account> accounts);
}
 
public interface AllContractDetails {

    ContractDetails forServiceNumber(String serviceNumber);

}

Not Only Persistence

The main benefit of Repositories is to make explicit where objects come from and make this place part of the Ubiquitous Language. While Repositories are often used to model object persistence to databases and such, this isn’t the only place where they can be useful. Repositories can be used to implement transient collections, they can be a great way to return ValueObjects, and even encapsulate client code used to invoke operations from a remote service.

Revision History

23/12/2010 - Original Publication in http://fragmental.tw
24/08/2015 - Migrated to http://philcalcado.com
25/11/2016 - Recovered and re-published with updates