Archive for the ‘java’ Category

From Amazon EC2 to Google Appengine

Wednesday, April 7th, 2010

Amazon EC2

I was very impressed when Amazon launched its EC2 cloud infrastructure.  So, eager to test this, I started up some servers and tried to install my koopjeszoeker application on it.  Until then this Java application was running on a private server (in Brussels).  This is almost 2 years ago.

Everything went reasonably well and I liked the possibility to install a new version on a separate server and then just use the elastic ip address feature to switch the production version to this new server.  The problem I had was running a database server which could also scale with the application.  Luckily, Amazon seemed to read my mind every time I needed something.  So they released Simple DB as a scalable database which was enough for my needs.  Later on, they released Relational Database Service, but I haven’t needed this yet.

The whole setup for my site was maybe a bit overkill, but is was a nice test setup for learning more about working with this infrastructure.  During the next 2 years, I added Cloudfront and used S3 as a backup solution. I also set up Amazon Elastic Load Balancing with autoscaling enabled for traffic peaks. I wanted a server solution that just worked so I wouldn’t need to spend too much time on system maintenance.

Switching to Appengine

I was able to lower my monthly bill for hosting the zamtam sites (koopjeszoeker.be, koopjeszoeker.com, fr.zamtam.be,zamtam.fr and recently beta sites zamtam.co.uk and zamtam.de) by switching from Amazon EC2 to Google Appengine.  The monthly Amazon bill (a constanly running High-CPU Medium instance with S3 traffic, Simple DB, Cloudfront and now and then a test instance) was around $ 180  a month.  My Amazon server ran for almost 2 years.  My Google Appengine bill is now around 40 cents a day, which makes around $ 12 a month.  This is 15 times less!

I think the main benefit of Appengine versus EC2 in my case was that I don’t need a constantly running server, but I do need enough capacity to handle peak traffic (mainly in the evening and the weekends).  In the EC2 case, this means you need to start more servers (manually or with elastic load balancing) while Appengine handles this automatically.  You (roughly) only pay for the extra CPU time consumed.

For me the only reason not to use Appengine until a month ago was the lack for Java non-blocking IO support.  Luckily, this issue was (silently, I only found out about it by reading the detailed release notes) resolved and you can now use UrlFetchService.fetchAsync()!

Lessons Learned

Some things I’d like to share about my experience with AppEngine:

1 GB is a lot of space.  Don’t optimize for storage size when you have 200 GB a month for $ 1 a day.  A typical application won’t need more than 10 GB which costs $ 1.5 a month.  Similarly, one million tasks a day is a lot.  Don’t prematurely optimize to put a lot of work in one task when you can spread it in many small concurrent tasks. Like Chris Anderson puts it in his book “Free” (I couldn’t find the exact quote since I listened to the audio book in the car and this isn’t searchable yet): “when something’s free, people tend to treat it like it’s indefinitely available”.

6.5 free CPU hours already allow for a lot of work.  I handle around 10.000 visitors a day, a lot of URL Fetches and many image transformations and only now and then I need more than this.

Startup time can be an issue, so I removed all unneeded jars from WEB-INF/lib and did some lazy loading.  This startup time is however mainly an issue during lower traffic times because Appengine stops and starts instances according to the traffic.  A visitor who hits a just starting app needs to wait longer and sometimes gets an error page.  Once your app is up and handles a steady amount of traffic, the server instances seem to stay up.  You can monitor this in the logs by using a ServletContextListener and log the event in the contextInitialized() and contextDestroyed() methods.

The task queues are really useful to do work asynchrously, like cleaning up the datastore (remove all thumbnails older than 30 days) or executing long running cron jobs.  Requests called by the task queue provide some headers that are useful to retry a task only for 3 times. I check this header in the catch block and when it is equal to 3, I don’t throw an exception anymore so the task is removed from the queue.

There are workarounds around the 30 second execution limit.  My workaround is to do a small amount of work in a Servlet (Spring Controller) and then add the same url with some other parameters (like a database cursor) to a task queue.

You don’t need a database for everything.  I moved some tables that would never change to my Spring config XML which avoids datastore lookups.

Your application needs to be able to handle sudden shutdowns and startups without error.  A user may arrive on a different server instance for every request.  I decided not to use HttpSessions (I almost never use this).

The URLFetchService caches responses by default.  You need to add your own no-cache request headers to get fresh results.

Subscribe to the Appengine downtime notification feed, you can also check the system status.  According to Murphy’s law, the first week when I ran on Appengine the whole thing that’s not supposed to die went down.  Google did provide a detailed post mortem explaining everything.  As long as they’re the ones who need to solve the infrastructure problems and not me, I’m happy with that.  I’m modest enough to know I couldn’t possible match their expertise.

It is possible to set up multiple custom domains, so you’re not stuck with myapp.appspot.com.  I also use 4 hostnames for thumbs, like thumbs1.zamtam.com, thumbs2.zamtam.com, … and a hash on the filename to determine which hostname should server the image.

I created a small java class AppengineUtils.java with some useful methods, feel free to use it.  I add the app version to my javascript file so this has a different url for each time I deploy a new version and the cache headers for this url can be set to a much longer time.  I check if I run in the development server to show some buttons in the html that don’t show up in the production version.

Improvements

The dashboard resets every morning at 9 AM CET.  There is no way to see the quota details for the previous days.

The time mentioned in the logs is confusing since it is not my local time.  An option in the Appengine settings to set the local time would be handy.

The blobstore (still in beta) misses some features, like an easy way to store data fetched with the UrlFetchService to the blobstore.  Luckily my url fetches are smaller than 1 MB so I can store them in the datastore.

The Google Accounts integration is sometimes confusing.  I use Appengine from my Google Apps domain (onthoo.com) but my site runs on different hostnames (koopjeszoeker.be, zamtam.fr, …).  So I needed to add (verify) these domains to my Google Apps Domain.  This part succeeded.  The problem is that I want to send an e-mail from the Mail API, but this service only allows outgoing mails from accounts that are developers for the app.  I can’t seem to add a developer who has an e-mail address like noreply at zamtam.com (an extra domain for my onthoo.com Google Apps domain) instead of noreply at onthoo.com.  I get the developer confirmation e-mail, but the link goes through a series of redirects to end in an error page.  I think my whole Appengine setup is a bit messed up since I currently have 9 apps deployed and it still shows I have 4 remaining (you can have maximum 10 apps).  It can have something to do with the fact that I have a Google Apps account and a Google Account with the same e-mail address.  I have to be careful to log in through https://appengine.google.com/a/<YOURDOMAIN.COM>/ instead of https://appengine.google.com .

The URLFetchService is limited to 10 asynchronous fetches at a time, while I need 12 at the moment.  An increase would be nice, although I know my case is probably an exceptional one.

The 30 active dynamic request limit is for me sometimes an issue, since I use the image api to generate thumbs on the fly, which takes a bit longer (fetch the image url, resize it, store it in the datastore and return it).  Since I’m using different hostnames for the thumbs (like thumbs1.zamtam.com, thumbs2.zamtam.com, …) I get up to 10 requests at a time for a page.  You see the problem when I have 3 users requesting a page at the same time… I cache the thumbs so they’re only generated once, but this doesn’t handle all the cases.  This is something I need to investigate further and maybe I should ask for an increase?

‘Naked domains’ are not supported anymore, so using zamtam.co.uk for example isn’t possible.  This makes the DNS setup a bit more complex.

Conclusion

A lot of exciting things can be done with Appengine. Especially when you run a website instead of long-running batch operations, Appengine can turn out to be a lot cheaper than Amazon EC2.  While EC2 allows you to do much more and in the way you prefer, Appengine pushes a bit to do it their way which makes it easier for you.  With Appengine, you also don’t need to think about scaling MySQL, load-balancing Apache or updating Linux.

One benefit can’t be stressed enough: you don’t need to plan your server capacity beforehand since Appengine does this automatically.  Also, deploying a new version is easy: upload it, test it and when ready, switch the default version to the new version.  No downtime, no worries (you can always go back to the previous version if something shows up later with the new version).

URL.equals()

Monday, June 8th, 2009

Apparently in Java an URL is equal if the ip is the same, so the following test will succeed (kapaza.be and kapaza.nl have the same ip address).

public void testURLEquals() throws MalformedURLException {
  assertEquals(new URL("http://www.kapaza.be"), new URL("http://www.kapaza.nl"));
}

Just so you know when you get strange results when putting URLs in a Set…  It’s even worse, since this means that comparing URLs needs name resolution, which is a slowdown.  More in the Javadocs.

One solution is to use an URI instead of an URL.  This will fail:

public void testURIEquals() throws URISyntaxException {
  assertEquals(new URI("http://www.kapaza.be"), new URI("http://www.kapaza.nl"));
}

Permgen space

Friday, January 9th, 2009

Personal note to self: use these to increase memory in JBoss and work a bit longer before you get a PermGenSpace exception.

-Xms128m -Xmx1024m -XX: PermSize=64m -XX: MaxPermSize=256m

Internal Refactoring

Thursday, April 10th, 2008

For my 10-day visit to Tokyo, Kyoto, Nara, Hakone and Chiba (my brother-in-law’s wedding), I needed to refactor my internal progamming a bit to avoid OutOfMemoryExceptions.

// This class is package protected to avoid
// external programs messing up.class Brain {

public void handleEvent(MeetNewPersonEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 fireEvent(new BowEvent(event));
}}

public void handleEvent(BowEvent event) {
 // Avoid infinite loop.  The problem is the
// 'esteemed higher' part, the person
// for whom you're bowing may think the same.
 if (event.isPersonEsteemedHigher() && !event.hasBowedTooMuch()) {
 bow();
 } else {
 nod();
 }}

protected void bow() {
 lookSincere();
 smile();
 bendForward();
}

I also needed to reprogram the eating subroutines.

public void handleEvent(FeelingHungryEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 // This was a tricky one to handle,
// the implementation is left
// as an exercise to the reader.
uploadChopsticksRoutine();
}}

The RunForTrainEvent and especially the WaitForTrainEvent could be canceled out since public transportation is much better than the location I originally wrote it for (Belgium).

public void handleEvent(RunForTrainEvent event) {
 if (getLocation().equals(Locations.JAPAN)) {
   stopRunning();
   relax();
   Thread.sleep(5*Timer.MINUTE);
 }
}

Finally, I needed to handle the RunningNoseEvent (extends HasColdEvent) better.

public void handleEvent(RunningNoseEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 // Blowing your nose in public is NOT DONE
// in Japan.  This is considered
// a bit the same as burping.
// Public humiliation is your part when
// this is not checked.
 dipNose();
} else {
 blowNose();
}
}

protected Location getLocation() {
...
if (isCurrentLocationUnknown()) {
 if (bodyTallerThanMostOthers() && friendlyPeople()
       && metroEvery2Minutes()
       && dressCode.equals(Dresscodes.COSTUME)
       && eatingCode.equals(EatingCodes.SHOPSTICKS)) {
  return Locations.JAPAN;
  }
 }
}

} // End class

This is released under an Apache license. Please notify me if these changes are of any use to you.

www.koopjeszoeker.com

Monday, June 18th, 2007

After tryouts with script.aculo.us, mootools and DWR, I finally decided to go with Google Web Toolkit (GWT) for the (badly needed) update of my koopjeszoeker website.

GWT allows me to make changes much faster than I could do it with DWR or other frameworks. Writing javascript from Java code seems strange at first, but it’s much like writing a Swing application.

On top of that, I get the following benefits:

  • I don’t need to worry about versioning, GWT uses hashes which create new filenames for every release. So I can cache all files “until the sun explodes”.
  • GWT compresses and obfuscates the generated javascript
  • Internationalisation is fully supported as in Java with properties files
  • GWT creates separate files for every browser and language, which means that a Dutch user on Firefox doesn’t need to download code that can be used on an English Safari.
  • The javascript is compatible on all major browsers

My experiences:

  • During development, compile to Firefox for one language instead of to all browsers for all languages. You can do this by adding this to your module xml file:
    <extend-property name="locale" values="nl_BE"/><set-property name="user.agent" value="gecko"/>
  • Get a faster computer, it really helps. On my Pentium 4 2.8 Ghz, it took 30 seconds to compile. On my brand new Dual Core 2.6Ghz with 2 RAID disks, it takes 5 seconds. Considering the number of times you compile, it really makes it much more fun to develop.
  • There is also a “hosted mode” which runs the code as Java, without compiling to javascript. I don’t use this anymore because (for my project) it takes longer to start up than simply compiling to javascript and it’s more difficult to integrate with a backend.
  • I did integration with Spring for the RPC calls by extending RemoteServiceServlet and overriding the method processCall:
    public String processCall(String payload) throws SerializationException {        initialize();        try {            RPCRequest rpcRequest = RPC.decodeRequest(payload, AjaxService.class);            return RPC.invokeAndEncodeResponse(this.ajaxService, rpcRequest.getMethod(),                rpcRequest.getParameters());          } catch (IncompatibleRemoteServiceException ex) {            return RPC.encodeResponseForFailure(null, ex);          }    }

  • AJAX pages don’t work well with Google indexing and Google Adsense. For that reason, I created seperate pages for the search results instead of putting the whole site on one page. I also added noscript tags to the html with a plain html result.