Archive for the ‘spring’ Category

From Amazon EC2 to Google Appengine

Wednesday, April 7th, 2010

Amazon EC2

I was very impressed when Amazon launched its EC2 cloud infrastructure.  So, eager to test this, I started up some servers and tried to install my koopjeszoeker application on it.  Until then this Java application was running on a private server (in Brussels).  This is almost 2 years ago.

Everything went reasonably well and I liked the possibility to install a new version on a separate server and then just use the elastic ip address feature to switch the production version to this new server.  The problem I had was running a database server which could also scale with the application.  Luckily, Amazon seemed to read my mind every time I needed something.  So they released Simple DB as a scalable database which was enough for my needs.  Later on, they released Relational Database Service, but I haven’t needed this yet.

The whole setup for my site was maybe a bit overkill, but is was a nice test setup for learning more about working with this infrastructure.  During the next 2 years, I added Cloudfront and used S3 as a backup solution. I also set up Amazon Elastic Load Balancing with autoscaling enabled for traffic peaks. I wanted a server solution that just worked so I wouldn’t need to spend too much time on system maintenance.

Switching to Appengine

I was able to lower my monthly bill for hosting the zamtam sites (koopjeszoeker.be, koopjeszoeker.com, fr.zamtam.be,zamtam.fr and recently beta sites zamtam.co.uk and zamtam.de) by switching from Amazon EC2 to Google Appengine.  The monthly Amazon bill (a constanly running High-CPU Medium instance with S3 traffic, Simple DB, Cloudfront and now and then a test instance) was around $ 180  a month.  My Amazon server ran for almost 2 years.  My Google Appengine bill is now around 40 cents a day, which makes around $ 12 a month.  This is 15 times less!

I think the main benefit of Appengine versus EC2 in my case was that I don’t need a constantly running server, but I do need enough capacity to handle peak traffic (mainly in the evening and the weekends).  In the EC2 case, this means you need to start more servers (manually or with elastic load balancing) while Appengine handles this automatically.  You (roughly) only pay for the extra CPU time consumed.

For me the only reason not to use Appengine until a month ago was the lack for Java non-blocking IO support.  Luckily, this issue was (silently, I only found out about it by reading the detailed release notes) resolved and you can now use UrlFetchService.fetchAsync()!

Lessons Learned

Some things I’d like to share about my experience with AppEngine:

1 GB is a lot of space.  Don’t optimize for storage size when you have 200 GB a month for $ 1 a day.  A typical application won’t need more than 10 GB which costs $ 1.5 a month.  Similarly, one million tasks a day is a lot.  Don’t prematurely optimize to put a lot of work in one task when you can spread it in many small concurrent tasks. Like Chris Anderson puts it in his book “Free” (I couldn’t find the exact quote since I listened to the audio book in the car and this isn’t searchable yet): “when something’s free, people tend to treat it like it’s indefinitely available”.

6.5 free CPU hours already allow for a lot of work.  I handle around 10.000 visitors a day, a lot of URL Fetches and many image transformations and only now and then I need more than this.

Startup time can be an issue, so I removed all unneeded jars from WEB-INF/lib and did some lazy loading.  This startup time is however mainly an issue during lower traffic times because Appengine stops and starts instances according to the traffic.  A visitor who hits a just starting app needs to wait longer and sometimes gets an error page.  Once your app is up and handles a steady amount of traffic, the server instances seem to stay up.  You can monitor this in the logs by using a ServletContextListener and log the event in the contextInitialized() and contextDestroyed() methods.

The task queues are really useful to do work asynchrously, like cleaning up the datastore (remove all thumbnails older than 30 days) or executing long running cron jobs.  Requests called by the task queue provide some headers that are useful to retry a task only for 3 times. I check this header in the catch block and when it is equal to 3, I don’t throw an exception anymore so the task is removed from the queue.

There are workarounds around the 30 second execution limit.  My workaround is to do a small amount of work in a Servlet (Spring Controller) and then add the same url with some other parameters (like a database cursor) to a task queue.

You don’t need a database for everything.  I moved some tables that would never change to my Spring config XML which avoids datastore lookups.

Your application needs to be able to handle sudden shutdowns and startups without error.  A user may arrive on a different server instance for every request.  I decided not to use HttpSessions (I almost never use this).

The URLFetchService caches responses by default.  You need to add your own no-cache request headers to get fresh results.

Subscribe to the Appengine downtime notification feed, you can also check the system status.  According to Murphy’s law, the first week when I ran on Appengine the whole thing that’s not supposed to die went down.  Google did provide a detailed post mortem explaining everything.  As long as they’re the ones who need to solve the infrastructure problems and not me, I’m happy with that.  I’m modest enough to know I couldn’t possible match their expertise.

It is possible to set up multiple custom domains, so you’re not stuck with myapp.appspot.com.  I also use 4 hostnames for thumbs, like thumbs1.zamtam.com, thumbs2.zamtam.com, … and a hash on the filename to determine which hostname should server the image.

I created a small java class AppengineUtils.java with some useful methods, feel free to use it.  I add the app version to my javascript file so this has a different url for each time I deploy a new version and the cache headers for this url can be set to a much longer time.  I check if I run in the development server to show some buttons in the html that don’t show up in the production version.

Improvements

The dashboard resets every morning at 9 AM CET.  There is no way to see the quota details for the previous days.

The time mentioned in the logs is confusing since it is not my local time.  An option in the Appengine settings to set the local time would be handy.

The blobstore (still in beta) misses some features, like an easy way to store data fetched with the UrlFetchService to the blobstore.  Luckily my url fetches are smaller than 1 MB so I can store them in the datastore.

The Google Accounts integration is sometimes confusing.  I use Appengine from my Google Apps domain (onthoo.com) but my site runs on different hostnames (koopjeszoeker.be, zamtam.fr, …).  So I needed to add (verify) these domains to my Google Apps Domain.  This part succeeded.  The problem is that I want to send an e-mail from the Mail API, but this service only allows outgoing mails from accounts that are developers for the app.  I can’t seem to add a developer who has an e-mail address like noreply at zamtam.com (an extra domain for my onthoo.com Google Apps domain) instead of noreply at onthoo.com.  I get the developer confirmation e-mail, but the link goes through a series of redirects to end in an error page.  I think my whole Appengine setup is a bit messed up since I currently have 9 apps deployed and it still shows I have 4 remaining (you can have maximum 10 apps).  It can have something to do with the fact that I have a Google Apps account and a Google Account with the same e-mail address.  I have to be careful to log in through https://appengine.google.com/a/<YOURDOMAIN.COM>/ instead of https://appengine.google.com .

The URLFetchService is limited to 10 asynchronous fetches at a time, while I need 12 at the moment.  An increase would be nice, although I know my case is probably an exceptional one.

The 30 active dynamic request limit is for me sometimes an issue, since I use the image api to generate thumbs on the fly, which takes a bit longer (fetch the image url, resize it, store it in the datastore and return it).  Since I’m using different hostnames for the thumbs (like thumbs1.zamtam.com, thumbs2.zamtam.com, …) I get up to 10 requests at a time for a page.  You see the problem when I have 3 users requesting a page at the same time… I cache the thumbs so they’re only generated once, but this doesn’t handle all the cases.  This is something I need to investigate further and maybe I should ask for an increase?

‘Naked domains’ are not supported anymore, so using zamtam.co.uk for example isn’t possible.  This makes the DNS setup a bit more complex.

Conclusion

A lot of exciting things can be done with Appengine. Especially when you run a website instead of long-running batch operations, Appengine can turn out to be a lot cheaper than Amazon EC2.  While EC2 allows you to do much more and in the way you prefer, Appengine pushes a bit to do it their way which makes it easier for you.  With Appengine, you also don’t need to think about scaling MySQL, load-balancing Apache or updating Linux.

One benefit can’t be stressed enough: you don’t need to plan your server capacity beforehand since Appengine does this automatically.  Also, deploying a new version is easy: upload it, test it and when ready, switch the default version to the new version.  No downtime, no worries (you can always go back to the previous version if something shows up later with the new version).

Spring sample project

Monday, April 2nd, 2007

I found a nice (and small) Spring sample project on the Interface21 Team Blog.

I learned about @NotNull annotations and how to define/process them in the applicationContext.xml. Also, this is an interesting remark I found about registering PropertyEditors:

The JavaBeans package uses a small little convention to resolve property editors. If a conversion is needed for a specific class, the JavaBeans package searches (amongst other) for a class in the same package named after the to be converted class, appended with ‘Editor’. Therefore, the CarModelEditor does NOT have to be registered; it’s found automatically!

I suggest you download the code and have a look at the xml and java files.

Spring Cache 2.4.1

Thursday, December 7th, 2006

I just released a new version of Spring AOP Cache.

I’ve updated the code so from now on EHCache 1.2.3 needs to be used. EHCacheInterceptor is updated with a property ‘overFlowToDisk’ which defaults to ‘false’. Objects don’t need to be Serializable anymore. If you want to use overFlowToDisk, then your objects need to be Serializable.

See the new release: spring-cache-2.4.1.jar.

XML Syntax Sugar in Spring 2.0

Thursday, November 30th, 2006

A nice feature which makes Spring configuration files shorter is described by Rod Johnson himself here: Interface21 Team Blog » XML Syntax Sugar in Spring 2.0.

Spring LocaleContextHolder

Monday, October 30th, 2006

In a Spring Controller, you can access your Locale with RequestContextUtils, like this:

Locale locale = RequestContextUtils.getLocale(request);

However, sometimes you don’t want to pass your request around, so then you can use this nice class which uses a ThreadLocal to hold to locale:

Locale locale = LocaleContextHolder.getLocale();

Spring-cache 2.3

Tuesday, August 29th, 2006

Today I released a new version of Spring AOP Cache.

The main changes are:
- Added OSCacheFlusher
- Removed default port from ClusterBroadcasterURLImpl, which must now be specified for each cluster member individually
- Improved logging

The release can be downloaded from SourceForge.

Using Spring AOP Cache flusher in a cluster

Wednesday, August 2nd, 2006

I created a sample Spring XML which gives a brief overview of how caches can be cleared in a cluster when using my AOP Cache framework.

See cluster-clear-caches-sample.xml

Please note that this is a limited documentation and you should check the source code to see how things work.

Spring AOP Cache on SourceForge

Monday, May 22nd, 2006

I finally created a SourceForge project for the Spring AOP cache code. Files can be downloaded from http://sourceforge.net/project/showfiles.php?group_id=167220 , anonymous CVS access is described on http://sourceforge.net/cvs/?group_id=167220.

I hope this gives the project more visibility and support!

Spring AOP Cache and SourceForge

Tuesday, May 9th, 2006

I finally managed to create a SourceForge project for the Spring AOP Cache project I’m working on.

There was a problem with the SourceForge “UNIX-name” when I tried to register a few days ago, but this problem seems to be fixed now. Whatever I entered for this field, it was always rejected..

Once the project is approved by the SourceForge reviewers, I will host the project there.

Spring AOP Cache v2.0.18

Tuesday, March 21st, 2006

I finally uploaded a new version of my Spring AOP Cache. I didn’t keep track of the changes, but most of the code should still work, although packages may have changed. I really need to update the documentation, but at the moment the Javadoc should make things clear.

I suggest you browse through the code to see any changes.

Please also take a look at the broadcast package in the javadoc which should make it possible to flush caches in a cluster by calling urls on each server. This is a very naïve implementation, but should be enough for most projects. It clears the whole cache in a cluster by publishing a CacheRefreshNeededEvent which is then sent to each member in the cluster.

If you don’t have a cluster, you can instantiate EHCacheFlusher in your Spring configuration. When a CacheRefreshNeededEvent is published with applicationContext.publishEvent(), this class will clear all EHCaches.

You can download the eclipse project, view the javadoc or simply download the jar.