Internal Domain-Specific Languages at PhilCalcado.com

Domain-Specific Languages (DSLs) are languages designed to write programs that solve problems from a specific domain. When dealing with that problem domain the language is extremely productive and sometimes it is meaningful –or even readable- to the domain experts.

The main problem in a DSL-based development model is the creation and maintenance of the language itself. To design a programming language is not an easy task and the required skills are way out of the average developer’s toolbox (see Nakatani’s work).

The new language would need an executable platform, what means a compiler that creates the executable artifact or an interpreter that executes source code. It would need a standard API containing all the common primitives like looping, sub-routines, I/O and etc.

Also the language represents the model and whenever the model changes –something very usual in software development projects- the language will have to be updated. If the DSL user requires a new resource it will have to ask the DSL developer to change the language.

Given these problems and difficulties, an approach that has become increasingly popular as DSLs achieve the mainstream is to construct the new language using other language as its base. These languages created inside other languages are known as Internal Domain-Specific Languages, a term made public by Martin Fowler -although similar concepts form the base of Domain-Specific Embedded Languages, a much older term.

Languages Inside Languages

When following the Internal DSL path, the DSL designer uses the Host Language’s constructs to implement the new language. The new language will be syntactically compatible with its Host Language so it can be executed by the same infrastructure –i.e. compiled by the same compiler, interpreted by the same interpreter. The final DSL can be an extension or a reduction of the Host Language.

As an extension, the new Domain-Specific concepts are made available to the Host Language. The final result is a new language that extends the host as it still uses the Host Language resources but adds Domain-Specific resources.

As a Domain-Specific reduction the new language is really specialized in the domain and the language designer hides (or even forbids) most of the Host Language’s constructs that are not relevant in the DSL. The final result is a new language that mixes a subset of the Host Language’s syntax (and maybe style) with the new Domain-Specific concepts.

As an example, think of an object-oriented, class-based language like Java or Ruby. In this language the abstractions used to create models are objects. If you want to model a date, say 21st Feb 1983, you use objects.

Time.mktime(1983,02,21)
#=> Mon Feb 21 00:00:00 +1100 1983

Suppose that instead of modeling time as objects you decide to extend the language so you have a construct that actually is a time representation. Instead of directly instantiating the object and handling it you add a new construct that represents dates as literals (using this quick’n’dirty trick –this snippet is an example, do not try this in your real-world program!).

21/02/1983
#=> Mon Feb 21 00:00:00 +1100 1983

You created a new concept, specific to a domain. Your Host Language has in it the semantics and syntax of how to model problems using objects, you added the concept of a date.

Since you implemented this concept in a way that it plays well with the Host Language you extended the Host Language with it.

On the other hand, suppose that you need to create a configuration file for a system. You realize that it would be easier if it was implemented as a program that your system executes. Since you don’t want the system administrator to bother with objects, classes and the like you use a language that is a subset of you Host Language.

describe_node{
    ip [200,122,122,122]
    hostname ['admin-node']
    system_port 3000
    alternative_port 3001
    security_method :SSL
}

describe_node{
    ip [200,122,122,102]
    hostname ['indexer-node']
    system_port 3000
    alternative_port 3001
    security_method :none
    java_version JDK_1_4
}

On this example you are using Ruby as a base but there is no Ruby code. The semantics of the whole little language has changed from message-passing object-orientation to computer nodes definition. You’ve reduced the Host Language to the very minimum needed by your domain, including changing the style from an imperative Host Language to a declarative DSL.

Just like you don’t know what the Ruby interpreter will do to run your code the user of this DSL doesn’t know if the define_node call creates a Ruby object or anything. Since objects are not part of the language’s domain this shouldn’t be an issue, just like is not an issue to allocate memory in Ruby.

The final language has Ruby syntactic rules but uses Host Language constructs. It is not Ruby anymore, it just borrows its syntax in order to be executable by the same environment.

Noise and Signal

The main motivation or Host Language Languages is to reduce the accidental noise caused by General Purpose Language.

Using an Internal DSL it is possible to reduce the Semantic Noise. The concepts you use in your code are actually the domain concepts, you don’t have to perform the domain concept-to-object model translation when coding.

Nonetheless you still have problems with Syntactic Noise. Representing real world abstract concepts in a format a computer understands (i.e. a programming language) will require the use of artificial symbols and syntax. When designing a completely new language, one can choose a syntax that will alleviate this problem but reusing an existing language will require you to be compatible with the Host Language’s syntax, that wasn’t designed with the domain in mind.

A New Language?

A question that always arises when talking about Internal DSLs is: are they new languages at all? If the Host Language is flexible enough to allow this modifications isn’t it just syntactic sugar?

I see an Internal DSLs as a different language. You can’t introduce new concepts in a language and still say it is the same.

See Java, for example. Java 1.5 is considered the most important release of the Java language since version 1.2 exactly because it introduced new concepts. You could emulate most of those new concepts previously using standard Java 1.4 code but when those became part of the language it shifted the way all developers think about problems.

Other interesting example is C++. C++ is a superset of C and still they are considered different languages. You can emulate Object-Orientation in C and use classes and objects as your abstractions but if you really want to do that it would be better to get those resources directly into your language. A language that includes the concepts of Object-Orientation in it will probably be more effective and expressive.

The Host Language concepts you introduce change the way you develop software. Instead of generic constructs like objects or functions you use domain constructs, implemented as primitives in your DSL. Instead of sending a message to an object that represents the business concept you actually deal with the business concept in your keywords.

Just like is more effective to have Object-Orientation constructs in a language than emulating them is more effective having the domain concepts built-in into a language.

This text is licensed under a Attribution-Noncommercial-Share Alike 2.5 Australia.