Some weeks ago I joined a handful of ThoughtWorkers invited to test the new Google AppEngine’s Java API. Unfortunately I had a project requiring a lot of attention during most of this period but once back on the beach I found some time to play around with it.
Google AppEngine (GAE) is Google’s shot in this cloud-computing segment. Using the service you deploy your system in Google’s infrastructure and are allowed to use
BigTable, MapReduce and other tools some tools that the Internet giant developed in the past years.
Google’s take on cloud-computing is to offer a full development platform and not only “somewhere to deploy to”. Using Amazon AWS, for example, you have access to virtual servers but Amazon has very little control over how you develop your application.
I like to think about Amazon and Google’s approaches as:
- The Cloud as Deployment Strategy: That is Amazon. Your application doesn’t need anything to be cloud’ed, the cloud is just a deployment option.
- The Cloud as the Development Platform: That is Google. The vendor doesn’t offer only a deployment solution but also impacts in how you develop your application, supplying you with tools and services but often limiting your options to whatever is officially supported.
Google’s model can be a bad thing when it gets on your way but in GAE’s case it is often just a matter of enforcing best security and scalability best practices. Some of them may be really odd for a simple application though.
This model has a big advantage: as Google has control over the environment it can offer some services and optimisations that deployment-only clouds can’t. Amazon is aware of that and just released a MapReduce service.
A terrible limitation in the platform strategy followed by Google is that you can only use the tools they support. For some time Python was the only language supported and that may explain what not so many applications use GAE.
This changes dramatically with the new Java support. We all know Java is probably the most popular language in our industry but more than that: during the past years more and more languages were created or ported to the JVM. By supporting Java Google automatically adds support for Groovy, Ruby, Scala and many other languages –as long as they can work under the restrictions imposed by the platform.
My Experience so Far
As I said before I didn’t have a lot of time to play around GAE. In the last two days I wrote a simple Clojure application that talks to Twitter and perform some very basic data transformation. The code is available on Github, the app is just a toy and works on the local server but I’m still adding twitter authentication to make it work properly on GAE. Feedback welcome.
The first time I tried using Clojure on GAE, weeks back, I had so many Classloader issues that I thought about giving up. During the past weeks both GAE and Clojure were updated many times and it seems that the class loading issues were resolved –maybe some Clojure committer was part of the GAE alpha testing program too.
GAE works using a Java Servlets 2.5 environment. My goal was to not use the Java language at all -and neither use a web framework like Compojure or Ring- so the easiest way to get this running as to export a namespace as a Servlet, something quite common in the community:
(ns BestMateServlet (:gen-class :extends javax.servlet.http.HttpServlet) (:use mate)) (defn- write-to-resp [resp text] (. (. resp getWriter) println text)) (defn -doGet [_ req resp] (write-to-resp resp (process-request req)))
And then using this Servlet in the web.xml file:
<servlet> <servlet-name>mybestmate</servlet-name> <servlet-class>BestMateServlet</servlet-class> <servlet> <servlet-mapping> <servlet-name>mybestmate</servlet-name> <url-pattern>/user/*</url-pattern> </servlet-mapping>
That works fine. The second step was to try to get rid of Eclipse. GAE has a fully functional Eclipse plugin but when I am writing anything but Java I use emacs. The GAE documentation points to some nice macros that allow you to start your local dev environment (using a modified Jetty), deploy to production or even download logs. Shouldn’t be that hard to convert some of those to Capistrano.
There are many limitations in the classes you can use. As an example I tried to get the Apache Commons HTTP client in my project but it uses raw sockets, something GAE forbids. My next option was to use Lazy XML but GAE’s SecurityManager will not allow you to use Clojure agents, therefore the call to parse-seq dies with a “java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThreadGroup)”.
Next try was using Duck Streams. It sort of worked but I had problems closing buffers –still not sure if it was my ignorance or something that doesn’t work right in GAE. So I got back to Google’s advice and used the URL class. It was actually pretty easy:
(defn- GET-body [uri] (with-open [reader (BufferedReader. (InputStreamReader. (. (URL. uri) openStream)))] (apply str (line-seq reader)))) (defn- twitter-page-for [username] (GET-body (twitter-page-url username)))
Besides those issues I did not have many problems to run Clojure in GAE. One may argue that it is a big issue that you can’t use agents inside the server and I have to agree with that. Google’s answer makes sense and if you check the documentation you will see that there are many ways to get high performance without using those.
The problem is that Clojure shines in single JVM concurrency. Lots of libraries rely on actors and the like being available and life is not that sweet when you can’t use those.
Development Feedback Cycle
As I said before you have a nice Eclipse plugin –nice enough for a beta- but you can use ant to do pretty much all you need during development. That helps a lot but the feedback cycle is still a bit too long.
The local environment tries to be very close to production but problems will always arise when you finally deploy. It is probably not a good idea to do this frequently during development but I think that in a real project at least this should be a step in the Continuous Integration build. Even during local development you have to constantly stop and start the local server.
I think that the new platform is pretty promising. Google has some advantages over Amazon in the multiple services it makes available. At the same time Amazon is a really strong player.
As pricing strategies for both options are really similar I think that right now the choice is between services provided by Google or flexibility provided by AWS. Amazon is adding services to their platform and Google certainly will evolve their platform in the near future.
I use AWS for a while now and I am really satisfied in most aspects. Even services like S3 you have enough freedom to use in multiple ways. My experience with GAE was very positive though and I will definitely think about it in my next project.
Update: Coverage by fellow ThoughtWorkers: