How To Write A Repository

Out of the supporting Patterns catalogued by Eric Evans, the Repository patterns is probably the most popular.

Persistence has been a hot topic in software development for a long time. The main problem is that the most popular approach for software development these days, Object-Orientation, doesn’t really map easily to efficient external storage systems like relational or even noSQL databases.

The technical limitations were mostly solved with some fantastic object mapping tools like Hibernate that make persisting and querying objects a breeze for most scenarios. The problem then became how do we integrate the act of persisting and retrieving objects with our Domain Model and, more important, our Ubiquitous Language.

Most of us use specialised objects –DAOs and Data Mappers in general– to convert business objects from and to their persistent equivalent. Those objects are often good enough for the task but they belong to the Infrastructure Layer and don’t transparently integrate with the Ubiquitous Language.

A good way to integrate persistence needs and the Ubiquitous Language is using what is known as Repositories. In his book, Evans defines the Repository pattern as “A mechanism for encapsulating storage, retrieval, and search behaviour which emulates a collection of objects”. This concept is easily assimilated by the Ubiquitous Language and simple enough to implement and to explain to domain experts.

Naming

The concept of a Repository as a list of objects is not too hard to understand but it is very common for those classes to end up with methods that are not related to lists at all.

After coaching many teams in the adoption of a Ubiquitous Language and related patterns, I’ve found out that the best way to make people remember that Repositories are not DAO-like classes starts with how you name them.

Years ago Rodrigo Yoshima told me about his convention when naming Repositories. Instead of the more common naming style displayed below:

 
class OrderRepository {
List<Order> getOrdersFor(Account a){...}
}

He promotes this:

 
class AllOrders {
List<Order> belongingTo(Account a){...}
}

It looks like a small change but it helps a lot. As an example, let’s look at two repositories that contain methods that don’t belong to them. Which one do you think it’s easier to identify as problematic?

 
//classic naming style
class UserRepository{
 User retrieveUserByLogin(String login){...}
 void submitOrder(Order order){...}
}
 
//client code
User u = userRepository.retrieveUserByLogin(“pcalcado”);
userRepository.submitOrder(new Order());

 
//Yoshima’s naming style
class AllUsers{
 User withLogin(String login){...}
 void submitOrder(Order order){...}
}
 
//client code
User u = allusers.withLogin(“pcalcado”);
allusers.submitOrder(new Order());

Keep in mind that the language you use does impact how you think (Sapir-Whorf works for programming). Methods that start with retrieve, list of get are often bad smells.

Avoid Method Explosion

A good Repository will model domain concepts in its interface. As an example, let’s assume that we have a business rule that says that every order placed in a weekend has 10% surcharge applied to it. If we want to display all orders in this situation we could do something like this:

 
List<Order> surchargedOrders = allOrders.placed(user, IN_A_SATURDAY);
surchargedOrders.addAll(allOrders.placed(user, IN_A_SUNDAY));
return surchargedOrders;

That works well but we’re leaking abstractions here. The fact that surcharged orders happen to be those placed in weekends shouldn’t be exposed to clients like that. Something like this will be better:

 
return allOrders.surchargedFor(user);

The problem with that is that for some entities you may end up having too many querying methods in a Repository. There are many ways to deal with this; you can parameterise the method call either with some flag or a Specification, for example:

 
Specification surcharged = specifications.surcharged();
return allOrders.thatAre(user, surchanged);

(note: in the example above I consider the specifications object a Repository of Specifications)

But there is another strategy that I quite like: multiple Repositories. In our ordering example there is no reason we can have two Repositories: AllOrders and SurchargedOrders. AllOrders represent a list containing every single order in the system, SurchargedOrders represents a subset of it. In our case we end up with something like:

 
//returns all orders
return allOrders.from(user);
 
//returns only orders with applied surcharge
return surchargedOrders.from(user);

The ”Subset Repositories” will often be modelled as classes but they can also be just parameterised instances of a base Repository. For example, we can have something like:

 
//a base Repository
class Users {
 private User.Status desiredStatus = null;
 
 public Users(){
   this(User.Status.ANY);
 }
 
 public Users(User.Status desiredStatus){
   this.desiredStatus= desiredStatus;
 }
 //methods go here...
}
 
//instantiated somewhere as
private Users allUsers = new Users();
private Users activeUsers = new Users(User.Status.ACTIVE);
private Users inactiveUsers = new Users(User.Status.INACTIVE);

Obviously there is always the risk that we replace method explosion with class explosion but my experience says that if subsets and modelled after the Ubiquitous Language this will not become a problem.

One Type Only

Another popular problem with Repositories happens when they start looking more like a bag of stuff as opposed to a collection. The naming strategy described above can help making the issue clear, but in many cases this is not seemed as a big deal until you end up with a Repository with one thousand lines of codes and efferent coupling so high that every single check-in includes changes to this class.

Here is an example from a system I’ve worked with:

 
public interface AllServices {
 
    List<Service> belongingTo(List<Account> accounts);
 
    Service withNumber(String serviceNumber);
 
    List<Service> relatedTo(Service otherService);
 
    List<Product> allActiveProductsBelongingTo(List<Account> accounts);
 
    List<Product> allProductsBelongingTo(List<Account> accounts);
 
    ContractDetails retrieveContractDetails(String serviceNumber);
}

It looks like the developers started by applying Yoshima’s naming convention but eventually started placing all sort of related methods in the Repository. It’s not modelling a collection anymore and the name AllServices does not make any sense.

A design smell to look for when designing Repositories is when more than one type is returned from its methods. It is probably ok to return Fundamental types like strings and booleans, but if your Repository returns more than one type of domain object you may be better splitting the bag of stuff into separate collections:

 
public interface AllServices {
 
    List<Service> belongingTo(List<Account> accounts);
 
    Service withNumber(String serviceNumber);
 
    List<Service> relatedTo(Service otherService);
}
 
public interface AllProducts {
 
    List<Product> activeBelongingTo(List<Account> accounts);
 
    List<Product> belongingTo(List<Account> accounts);
}
 
public interface AllContractDetails {
   ContractDetails forServiceNumber(String serviceNumber);
}

Many times Repositories end up like this because a given class has access to everything required to return more than one object and it would be wasteful to create wrappers for each type. In this case you probably should still model your Repositories as different entities and have them implemented by the one class, like this:

 
public class BillingSystemGateway implements AllServices, AllProducts , AllContractDetails {
 
    List<Service> belongingTo(List<Account> accounts){...}
 
    Service withNumber(String serviceNumber) {...}
 
    List<Service> relatedTo(Service otherService) {...}
 
List<Product> activeBelongingTo(List<Account> accounts) {...}
 
List<Product> belongingTo(List<Account> accounts) {...}
 
ContractDetails forServiceNumber(String serviceNumber) {...}

This shouldn’t be very common, though. If you are facing this scenario too frequently it may be a good time to revisit your integration code.

Not Only Persistence

Repositories are often used to model object persistence, but that doesn’t have to be the only case. They are quite useful for system integration, simple in-memory collections and even to return Value Objects.

Keep in mind that the main benefit of Repositories is to explicitly have a place where objects come from and make this object part of the Ubiquitous Language. The actual implementation of the Repository may have a big impact in how we model its interface but in the end of the day we should aim to have it as close as a list of domain objects as possible.