Internal Data Transfer Objects

The Layers Pattern aims to minimise complexity by grouping objects with similar responsibilities. One common question that arises when people apply this Pattern is how to integrate the Presentation and Business Layers, the two topmost Layers in the diagram below.

I find it curious that the relationship between these two Layers in particular sparks debate, whereas interactions between other Layers — such as between Business and Persistence — seem much more straightforward. I believe we should use the same mindset when talking about relationships between any of these Layers: the bottom Layer defines an API and the top Layer uses it. This API takes in and returns regular objects, and these objects encapsulate implementation details about which Layer they come from. The API is usually implemented as a Façade, and it returns proper, non-anaemic objects, with state and behaviour.

But this is not necessarily a popular perspective. It is somewhat common to use Data Transfer Objects as the communication medium between Layers.

I think that using DTOs in this way has produced some of the most confusing architectures I have ever been exposed to.

DTO Quick Intro

Let’s quickly review what Data Transfer Objects are all about. Imagine a system with two nodes (e.g. virtual machines, processes, servers, web services — here, a “node” is something that has its own address space). Now imagine that you want to share data structures between these two nodes.

One popular pattern is to send a proxy of the object to the client, which will then treat this proxy as if it were the actual object. One problem I have found with this approach is that, when following usual Object-Orientation guidelines, the object sent between nodes should not execute too much logic on its own, but rather collaborate with other objects to accomplish its goals. Those collaborating objects might not have been copied to the new server, so operations performed by the local proxy/copy may incur expensive RPC or IPC calls as the proxy tries to reach out to its constellation of collaborating objects to perform even the most basic tasks, such as calling a toString() method.

Martin Fowler catalogued the Data Transfer Object Pattern in his seminal Patterns of Enterprise Application Architecture book. This is a Distribution Pattern in which you do not just send a proxy of the object to the client, but a data structure that packs pretty much everything the object will need to perform its tasks into a single object, optimised for distribution over the network. I like to think of it as a .tar.gz containing the objects we need.

Although there are many good reasons to use this Pattern, there are obviously some undesirable effects. The most important one, to me, is that suddenly you have to maintain two different and heavily coupled hierarchies of objects: the DTOs and the business objects. You also need to write and maintain the mapping logic that converts between one and the other.

This complexity cost is exacerbated when you consider that DTOs represent accidental complexity. They are not part of your domain model.

DTOs exist because they are one of the few practical ways in which we can make distributed systems work. Without something like them, most distributed operations would be extremely slow and inefficient.

DTOs are generally useful when building distributed systems, but I have found that it is very unlikely that they are needed for local communication, such as when two Layers interact.

In the next sections, let’s explore some of the reasons people give when asked why they still use internal DTOs.

“Because MVC Requires DTOs”

I would argue that there is some general confusion about the MVC Pattern in our industry, especially about what the Model in it actually means. Let’s revisit some of the foundational literature to see whether we can find some clarity.

The original MVC paper describes the Model as:

A Model is an active representation of an abstraction in the form of data in a computing system

When cataloguing the MVC Pattern, Martin Fowler says:

In its most pure OO form the model is an object within a Domain Model. You might also think of a Transaction Script as the model, providing that it contains no UI machinery.

And in Refactoring: Improving the Design of Existing Code, he also says:

The gold at the heart of MVC is the separation between the user interface code (the view, these days often called the presentation) and the domain logic (the model). The presentation classes contain only the logic needed to deal with the user interface. Domain objects contain no visual code but all the business logic. This separates two complicated parts of the program into pieces that are easier to modify. It also allows multiple presentations of the same business logic. Those experienced in working with objects use this separation instinctively, and it has proved its worth.

Exploring these authors’ work, we find no direct relation between Layers and MVC. You can use one without the other. Applying a layered architecture together with MVC can be useful, though. I like the way Craig Larman explains this interplay in his book:

This is a key principle in the Pattern Model-View-Controller (MVC). MVC was originally a small-scale Smalltalk-80 pattern, and related data objects (models), GUI widgets (views), and mouse and keyboard event handlers (controllers). More recently, the term “MVC” has been co-opted by the distributed design community to also apply on a large-scale architectural level. The Model is the Domain Layer, the View is the UI Layer, and the Controllers are the workflow objects in the Application layer.

Following Larman, MVC is an interesting way to organise Layers. We can draw a picture like this:

This leads us to the conclusion that, when applying both Layers and MVC, our Domain Model plus the whole infrastructure that supports it is our Model in MVC parlance.

Back to DTOs, let’s take a look at how the original MVC paper thought about the relationship between Model and View:

VIEW

DEFINITION

To any given Model there is attached one or more Views, each View being capable of showing one or more pictorial representations of the Model on the screen and on hardcopy. A View is also able to perform such operations upon the Model that are reasonably associated with that View.

[…]

VIEWS

A view is a (visual) representation of its model. It would ordinarily highlight certain attributes of the model and suppress others. It is thus acting as a presentation filter. A view is attached to its model (or model part) and gets the data necessary for the presentation from the model by asking questions. It may also update the model by sending appropriate messages. All these questions and messages have to be in the terminology of the model; the view will therefore have to know the semantics of the attributes of the model it represents. (It may, for example, ask for the model’s identifier and expect an instance of Text; it may not assume that the model is of class Text.)

So the View in MVC is not only tied to the Model, but is also responsible for filtering its data and displaying only what is relevant. That is very interesting because creating different perspectives over the domain model is one of the main reasons people use DTOs internally:

But what we have just read about MVC tells us that this is the View’s role. This makes the DTO in the previous diagram completely unnecessary.

Therefore, there is nothing in the MVC Pattern per se that would require you to use internal DTOs. The View is responsible for accessing the Model and extracting what should be displayed.

Using DTOs to Prevent Calls to Dangerous Methods

Another common reason given to justify the internal DTO approach is to prevent UI code (i.e. MVC’s View and Controller) from calling “business methods” present in domain model objects. Over time, I figured out that by “dangerous” people usually mean operations with side effects.

DTOs allow you to hide those methods and, theoretically, people developing the front end will not be able to call them.

At first this sounds reasonable. Unless you are building super-rich desktop applications, it is not very likely that these methods should be called from the Presentation Layer. But I would like to offer an alternative solution to this issue: just don’t call the bloody methods! Your developers are not children; they should not need fancy training wheels.

And even if, for some reason, you cannot trust the development team, there is always a way to call those methods. It does not matter how many layers of DTOs you use to hide your business objects — a developer can always find a way.

And if you really, really want to do such a thing, there are other means. The simplest solution I can think of is to define checkstyle (or equivalent) rules that forbid those calls and break the build. If you really want to go fancy, just define an interface that does not have the “dangerous” methods and use something like Macker to avoid calls to the implementation.

Loose Coupling

When the Presentation Layer is too tightly coupled to the Business Layer, we might need to change the UI code whenever there is a change in the business objects, even if the change itself should not affect the user experience. DTOs are one way people have tried to minimise this pain. I personally think they make matters worse.

Prior to introducing DTOs, we would have two components: Business and Presentation. Whenever we change Business, it is possible that we need to change Presentation.

When adding a DTO, we now have three components: Presentation, DTO, and Business. When Business changes, it is very likely that the DTO will also need to change. And when the DTO changes, it will potentially require a change in Presentation anyway.

The DTO here does not solve the coupling problem; it only adds another moving piece. Instead of keeping two components in sync, now you have to do that with three.

This reminds me of a classic quote by David Wheeler:

Any problem in computer science can be solved with another layer of indirection. But that usually creates another problem.

The key to reducing coupling between Presentation and Business is to define a good API between them. Make sure you are not leaking too much detail about how the Business Layer is implemented into the Presentation Layer, or vice versa. If your Presentation Layer has to be extremely decoupled from the Domain Model, think about adopting a proper Presentation Model.

Concluding

I am not sure why Data Transfer Objects are such a popular Pattern when integrating Presentation and Business Layers. I think there are two main drivers:

Data Transfer Objects are often misused as records. Even after decades of Object-Oriented programming, and using OO tools and languages, people still unconsciously fall back to a procedural style, where a problem is solved by dumb data structures plus smart procedures. There is nothing wrong with going procedural if you know what you are doing, but that is rarely the case here.
Sun evangelised the use of what they called Transfer Objects (previously called Value Objects) in its EJB 2.x architecture. Those are internal or remote DTOs used to solve some problems introduced by Entity Beans and J2EE technology in general. In newer versions of the EJB spec, and in applications that do not use that technology — e.g. applications using Hibernate instead — the Pattern is not only unnecessary but also introduces new problems.