Archive for the ‘webdevelopment’ Category

From Amazon EC2 to Google Appengine

Wednesday, April 7th, 2010

Amazon EC2

I was very impressed when Amazon launched its EC2 cloud infrastructure.  So, eager to test this, I started up some servers and tried to install my koopjeszoeker application on it.  Until then this Java application was running on a private server (in Brussels).  This is almost 2 years ago.

Everything went reasonably well and I liked the possibility to install a new version on a separate server and then just use the elastic ip address feature to switch the production version to this new server.  The problem I had was running a database server which could also scale with the application.  Luckily, Amazon seemed to read my mind every time I needed something.  So they released Simple DB as a scalable database which was enough for my needs.  Later on, they released Relational Database Service, but I haven’t needed this yet.

The whole setup for my site was maybe a bit overkill, but is was a nice test setup for learning more about working with this infrastructure.  During the next 2 years, I added Cloudfront and used S3 as a backup solution. I also set up Amazon Elastic Load Balancing with autoscaling enabled for traffic peaks. I wanted a server solution that just worked so I wouldn’t need to spend too much time on system maintenance.

Switching to Appengine

I was able to lower my monthly bill for hosting the zamtam sites (koopjeszoeker.be, koopjeszoeker.com, fr.zamtam.be,zamtam.fr and recently beta sites zamtam.co.uk and zamtam.de) by switching from Amazon EC2 to Google Appengine.  The monthly Amazon bill (a constanly running High-CPU Medium instance with S3 traffic, Simple DB, Cloudfront and now and then a test instance) was around $ 180  a month.  My Amazon server ran for almost 2 years.  My Google Appengine bill is now around 40 cents a day, which makes around $ 12 a month.  This is 15 times less!

I think the main benefit of Appengine versus EC2 in my case was that I don’t need a constantly running server, but I do need enough capacity to handle peak traffic (mainly in the evening and the weekends).  In the EC2 case, this means you need to start more servers (manually or with elastic load balancing) while Appengine handles this automatically.  You (roughly) only pay for the extra CPU time consumed.

For me the only reason not to use Appengine until a month ago was the lack for Java non-blocking IO support.  Luckily, this issue was (silently, I only found out about it by reading the detailed release notes) resolved and you can now use UrlFetchService.fetchAsync()!

Lessons Learned

Some things I’d like to share about my experience with AppEngine:

1 GB is a lot of space.  Don’t optimize for storage size when you have 200 GB a month for $ 1 a day.  A typical application won’t need more than 10 GB which costs $ 1.5 a month.  Similarly, one million tasks a day is a lot.  Don’t prematurely optimize to put a lot of work in one task when you can spread it in many small concurrent tasks. Like Chris Anderson puts it in his book “Free” (I couldn’t find the exact quote since I listened to the audio book in the car and this isn’t searchable yet): “when something’s free, people tend to treat it like it’s indefinitely available”.

6.5 free CPU hours already allow for a lot of work.  I handle around 10.000 visitors a day, a lot of URL Fetches and many image transformations and only now and then I need more than this.

Startup time can be an issue, so I removed all unneeded jars from WEB-INF/lib and did some lazy loading.  This startup time is however mainly an issue during lower traffic times because Appengine stops and starts instances according to the traffic.  A visitor who hits a just starting app needs to wait longer and sometimes gets an error page.  Once your app is up and handles a steady amount of traffic, the server instances seem to stay up.  You can monitor this in the logs by using a ServletContextListener and log the event in the contextInitialized() and contextDestroyed() methods.

The task queues are really useful to do work asynchrously, like cleaning up the datastore (remove all thumbnails older than 30 days) or executing long running cron jobs.  Requests called by the task queue provide some headers that are useful to retry a task only for 3 times. I check this header in the catch block and when it is equal to 3, I don’t throw an exception anymore so the task is removed from the queue.

There are workarounds around the 30 second execution limit.  My workaround is to do a small amount of work in a Servlet (Spring Controller) and then add the same url with some other parameters (like a database cursor) to a task queue.

You don’t need a database for everything.  I moved some tables that would never change to my Spring config XML which avoids datastore lookups.

Your application needs to be able to handle sudden shutdowns and startups without error.  A user may arrive on a different server instance for every request.  I decided not to use HttpSessions (I almost never use this).

The URLFetchService caches responses by default.  You need to add your own no-cache request headers to get fresh results.

Subscribe to the Appengine downtime notification feed, you can also check the system status.  According to Murphy’s law, the first week when I ran on Appengine the whole thing that’s not supposed to die went down.  Google did provide a detailed post mortem explaining everything.  As long as they’re the ones who need to solve the infrastructure problems and not me, I’m happy with that.  I’m modest enough to know I couldn’t possible match their expertise.

It is possible to set up multiple custom domains, so you’re not stuck with myapp.appspot.com.  I also use 4 hostnames for thumbs, like thumbs1.zamtam.com, thumbs2.zamtam.com, … and a hash on the filename to determine which hostname should server the image.

I created a small java class AppengineUtils.java with some useful methods, feel free to use it.  I add the app version to my javascript file so this has a different url for each time I deploy a new version and the cache headers for this url can be set to a much longer time.  I check if I run in the development server to show some buttons in the html that don’t show up in the production version.

Improvements

The dashboard resets every morning at 9 AM CET.  There is no way to see the quota details for the previous days.

The time mentioned in the logs is confusing since it is not my local time.  An option in the Appengine settings to set the local time would be handy.

The blobstore (still in beta) misses some features, like an easy way to store data fetched with the UrlFetchService to the blobstore.  Luckily my url fetches are smaller than 1 MB so I can store them in the datastore.

The Google Accounts integration is sometimes confusing.  I use Appengine from my Google Apps domain (onthoo.com) but my site runs on different hostnames (koopjeszoeker.be, zamtam.fr, …).  So I needed to add (verify) these domains to my Google Apps Domain.  This part succeeded.  The problem is that I want to send an e-mail from the Mail API, but this service only allows outgoing mails from accounts that are developers for the app.  I can’t seem to add a developer who has an e-mail address like noreply at zamtam.com (an extra domain for my onthoo.com Google Apps domain) instead of noreply at onthoo.com.  I get the developer confirmation e-mail, but the link goes through a series of redirects to end in an error page.  I think my whole Appengine setup is a bit messed up since I currently have 9 apps deployed and it still shows I have 4 remaining (you can have maximum 10 apps).  It can have something to do with the fact that I have a Google Apps account and a Google Account with the same e-mail address.  I have to be careful to log in through https://appengine.google.com/a/<YOURDOMAIN.COM>/ instead of https://appengine.google.com .

The URLFetchService is limited to 10 asynchronous fetches at a time, while I need 12 at the moment.  An increase would be nice, although I know my case is probably an exceptional one.

The 30 active dynamic request limit is for me sometimes an issue, since I use the image api to generate thumbs on the fly, which takes a bit longer (fetch the image url, resize it, store it in the datastore and return it).  Since I’m using different hostnames for the thumbs (like thumbs1.zamtam.com, thumbs2.zamtam.com, …) I get up to 10 requests at a time for a page.  You see the problem when I have 3 users requesting a page at the same time… I cache the thumbs so they’re only generated once, but this doesn’t handle all the cases.  This is something I need to investigate further and maybe I should ask for an increase?

‘Naked domains’ are not supported anymore, so using zamtam.co.uk for example isn’t possible.  This makes the DNS setup a bit more complex.

Conclusion

A lot of exciting things can be done with Appengine. Especially when you run a website instead of long-running batch operations, Appengine can turn out to be a lot cheaper than Amazon EC2.  While EC2 allows you to do much more and in the way you prefer, Appengine pushes a bit to do it their way which makes it easier for you.  With Appengine, you also don’t need to think about scaling MySQL, load-balancing Apache or updating Linux.

One benefit can’t be stressed enough: you don’t need to plan your server capacity beforehand since Appengine does this automatically.  Also, deploying a new version is easy: upload it, test it and when ready, switch the default version to the new version.  No downtime, no worries (you can always go back to the previous version if something shows up later with the new version).

Koopjeszoeker.be supports rich snippets

Wednesday, October 28th, 2009

Since today, koopjeszoeker.be (and koopjeszoeker.com, fr.zamtam.be and www.zamtam.fr) supports Google rich snippets.  Not much happens yet, but you can have a look at the parsed data in this testing tool.

At the moment I still get “Insufficient data to generate the preview” from this tool although it seams to be able to parse the data, no idea what it means…

Grails and Google AppEngine

Monday, June 8th, 2009

I’ve created a small demo showing Grails on Google AppEngine.

The site is a showcase of Grails on Google AppEngine with the Grails AppEngine plugin.

Places are stored in the AppEngine datastore and a taglib is added for rendering the login button and the currently logged in user.

Additionally, an integration with Google AJAX Search API is done when adding a place.

GWT: follow-up

Tuesday, February 26th, 2008

This is a follow-up post on Why I dumped GWT.

First of all, I had a really long week + weekend and I was tired when I wrote the post. I apologize for the rather in-your-face title. I should have chosen a more subtitle wording, especially since I really appreciate all the work that developers donate in their free time to open source projects.

I did’t expect my blog post would end up as the main article on ongwt.com. I use my blog mainly to communicate with colleagues about my work. It probably has something to do with my recent switch to feedburner. Hooray for feedburner!

Like I wrote, I like GWT and its approach. It’s really nice to see how intelligent the development team approached and solved the problem at hand. The image handling (sprites), js compression and http round-trip optimisations are really clever.

I’ll start by describing how my site came to what it is now. The site started as a playground for me. I wanted to try the latest new thing (ajax!) and so I first started with scriptaculous. I didn’t succeed in getting the layout right with pure css (after all, I’m only a Java developer) and I stumbled upon GWT. The mail app demo is really nice and this gave me the idea to start a site that searches on-line marketplaces and lets users treat the classifieds as e-mail: with the possibility to delete, mark items as read/unread and star them. Much like Google Reader.

For this, GWT was the perfect match. Everything went as I expected it to do, sometimes with some cursing about why my onclick events were not fired and why a non-existing background image in css stopped the hosted mode to work, but all-in-all it was very good.

After some (positive) discussion with my other half, I wanted the work I put in it to give me some return-on-investment (money!). It turned out after some basic user testing (she sitting at the keyboard and I shouting “why would you do that?” and “that’s not meant to be used like that”) that the whole idea was too complex for a standard user (no offense to my super-intelligent girlfriend) who stumbles upon my site. So week after week I removed some of the functionality to make the page less overwhelming. Until I finally found myself using GWT only for the autocompleter, which clearly wasn’t the intention of the GWT framework. This, together with the remarks I gave in the previous post (adsense, analytics and seo) made me decide to temporarily stop developing with GWT.

I expect to start again with GWT once the site “gains some momentum”, and then I will re-enable those more complex features which should be easier than with mootools. I’ll probably ask some advice from a usability expert about how to design the page with all this functionality without overwhelming first-time users. And I will check out MyGWT and GWT-Ext more thoroughly.

Why I dumped GWT

Sunday, February 24th, 2008

I’ve used GWT for over half a year now on koopjeszoeker.be. Two weeks ago I decided to stop development with GWT and go with plain HTML and mootools for the autocompleter. I’ve used mootools already a lot and I’m really getting the hang of it.

Why? Why did I spend all this time developing in GWT and why did I decided to stop?

First of all, GWT is a fantastic framework for doing web development. I think it’s the best tool at the moment if you want to build the next GMail or an intranet application. For all those slow and lousy web interfaces (for timesheets, CMS, …), GWT could come to the rescue. But my site is completely different.

Some of the reasons below are not really related to GWT, but more to using ajax in general. It is my opinion however, that these problems are easier to solve with ‘standard’ javascript libraries like mootools, prototype, dwr or scriptaculous since these have a nice way to add some ajax to certain DOM elements. For example, in GWT I had to subclass the autocompleter textbox so I could attach it to an input field that already existed in the HTML. Maybe all of this could by solved if GWT had constructors that accept a DOM id too.

SEO

I’m entering in a highly competitive segment where SEO is really important. Since most of the html is build with GWT, you end up with a pretty empty page for Google. I added some noscript tags, but this was not really helpful.

Adsense

Another problem were my adsense banners. Since I didn’t have a lot of content on the page, the banners were sometimes off topic. An even bigger problem was that the banners stayed the same when people searched for different keywords (since the ajax refresh didn’t trigger an adsense refresh). I solved this by doing the search with a page refresh instead of an ajax call. The ajax part of the site was limited to sorting, faceting, i18n and displaying tips.

Google Analytics

I’m also using Google Analytics. Although no real evidence exists, it would be naive to think that Google isn’t using this data. But because of the ajax calls, I don’t get as many pageviews as a static version of competing sites. Every visitor is seen as doing 1 page visit, while he may have browsed several pages. This makes my bounce rate in Google Analytics really high. This can’t be good for my Google rankings.
In Belgium we have CIM Metriweb, a kind of archaic tracking system that is used when marketeers look for sites that have many hits. I’m not currently using this, but this thing depends on pageviews if you want the big guys to donate to your site.

What now?

I wanted a fully functional HTML version, where GWT was injected in some places to replace the full page loads with ajax calls. However, I couldn’t find an easy way to do this. And once I succeeded, I found that I had almost no code left in GWT that was worth using it instead of mootools. So now, after a lot of research and experimenting, I decided that I’ll go for the plain-old html way and spiced up some parts with ajax (like the “so 2007″ textbox autocompleter).

I discovered the Blueprint CSS framework (version 0.7 now has semantic classes) and CSS sprites. I’ve used Kuler and read a lot about CSS tips and tricks. I even read a bit about usability.

And since I spend 3 hours a day on the train, I have time to redesign the site. Using blueprint, it really was easy and the result is a much better looking, stable, fast site. Check the homepage: it only has 1 css, 1 javascript, 1 gif and 1 jpeg, but there are 25 images! Ah, the magic of blueprint, sprites and jawr…

Update: please see GWT follow-up

Compress Javascript and CSS with Jawr

Monday, February 18th, 2008

Today I used the nice Jawr taglib which compresses javascript and css files. There’s enough information on the Jawr website about how to configure everything, so I won’t write about this.

Things to remember are:

  • Better structuring / versioning of your development javascript and css versions while still publishing them as 1 file
  • Gzip support for compliant browsers
  • Give the css and js files cache headers ‘until the sun explodes’
  • When you deploy a new version of your site, a new css and js version will be downloaded by the browser

Net result: our YSlow score went from 49 to 69!

2 Things I want in CSS 3

Sunday, February 17th, 2008

I’ve done some html/css restyling lately and there are some things I would like to see added to CSS 3. The process to request some changes to be incorporated into CSS 3 is a bit overwhelming to me, so I just post them here and hope they will be picked up by someone.

CSS variables

CSS variables would be nice. I want a way so I can easily change all colors in my CSS with one adjustment, not by searching for the color in the file and replacing it with the new value. I also think this would increase readability of the file.
This would allow to define recurring parts of the layout in the CSS file like this:

var backgroundcolor : #FFFFFF;
var border: 1px solid #CCCCCC;

.container {
background-color: $backgroundcolor;
border: $border;
}

.navbar{
border: $border;
}

Path variables

If I could rename a path selector to a variable, I could remove a lot of classes from the html and still be able to easily change the css.
Take this example:

.navbar ul li a {
text-decoration: none;
}

.navbar ul li a:hover {
text-decoration: underline;
}

This would become:

var listItem: .navbar ul li;

$listItem a {
text-decoration: none;
}

$listItem a:hover {
text-decoration: underline;
}

These examples look simple, but my experience is that a lot of the same values can be found in many CSS files. Wouldn’t it be nice if we have a block of variable definitions at the top so we only have to specify once that the color of all borders should be changed?

This would also make it easier to let users (on a blog for example) override the colors with their own stylesheet, which overrides the variables with their settings.

The only way I know off to achieve this at the moment is by generating the CSS with a templating framework like JSP or Velocity (or why not PHP), but this seems like overkill to me.

So, anyone with the power to move the W3C board, go on (and let me know of the results)!

Firefox add-ons

Tuesday, October 23rd, 2007

In the series “which Firefox add-ons do you need as a web developer”, here’s my list:
- Firebug
- Web developer toolbar
- Download statusbar

Book list

Friday, October 12th, 2007

I just ordered 2 new books at Amazon:

Squid, Definitive Guide
Release It!

I’m curious when they’ll arrive!

For the interested: my current library (ahum):
Building Scalable Websites
Professional Java Development with the Spring Framework
Expert One-on-One J2EE Development without EJB
Hibernate in Action

I created these links to Amazon with the Amazon Affiliate program, so if you a book through this links, you’re helping a poor Java developer…

GWT and IE: The Story Continues

Friday, September 28th, 2007

This is the second time I loose more than a day searching why Internet Explorer choked on my GWT application.

Finally, I replaced my whole GWT code with the latest working version, but still without result. The only difference between the working version and the broken one was the CSS file. After a boring process of commenting and uncommenting CSS lines, I found that the guilty part was this:

#content{
border: 1px solid #3F8FB6;
}

I still don’t get it…