Archive for the ‘server’ Category

From Amazon EC2 to Google Appengine

Wednesday, April 7th, 2010

Amazon EC2

I was very impressed when Amazon launched its EC2 cloud infrastructure.  So, eager to test this, I started up some servers and tried to install my koopjeszoeker application on it.  Until then this Java application was running on a private server (in Brussels).  This is almost 2 years ago.

Everything went reasonably well and I liked the possibility to install a new version on a separate server and then just use the elastic ip address feature to switch the production version to this new server.  The problem I had was running a database server which could also scale with the application.  Luckily, Amazon seemed to read my mind every time I needed something.  So they released Simple DB as a scalable database which was enough for my needs.  Later on, they released Relational Database Service, but I haven’t needed this yet.

The whole setup for my site was maybe a bit overkill, but is was a nice test setup for learning more about working with this infrastructure.  During the next 2 years, I added Cloudfront and used S3 as a backup solution. I also set up Amazon Elastic Load Balancing with autoscaling enabled for traffic peaks. I wanted a server solution that just worked so I wouldn’t need to spend too much time on system maintenance.

Switching to Appengine

I was able to lower my monthly bill for hosting the zamtam sites (koopjeszoeker.be, koopjeszoeker.com, fr.zamtam.be,zamtam.fr and recently beta sites zamtam.co.uk and zamtam.de) by switching from Amazon EC2 to Google Appengine.  The monthly Amazon bill (a constanly running High-CPU Medium instance with S3 traffic, Simple DB, Cloudfront and now and then a test instance) was around $ 180  a month.  My Amazon server ran for almost 2 years.  My Google Appengine bill is now around 40 cents a day, which makes around $ 12 a month.  This is 15 times less!

I think the main benefit of Appengine versus EC2 in my case was that I don’t need a constantly running server, but I do need enough capacity to handle peak traffic (mainly in the evening and the weekends).  In the EC2 case, this means you need to start more servers (manually or with elastic load balancing) while Appengine handles this automatically.  You (roughly) only pay for the extra CPU time consumed.

For me the only reason not to use Appengine until a month ago was the lack for Java non-blocking IO support.  Luckily, this issue was (silently, I only found out about it by reading the detailed release notes) resolved and you can now use UrlFetchService.fetchAsync()!

Lessons Learned

Some things I’d like to share about my experience with AppEngine:

1 GB is a lot of space.  Don’t optimize for storage size when you have 200 GB a month for $ 1 a day.  A typical application won’t need more than 10 GB which costs $ 1.5 a month.  Similarly, one million tasks a day is a lot.  Don’t prematurely optimize to put a lot of work in one task when you can spread it in many small concurrent tasks. Like Chris Anderson puts it in his book “Free” (I couldn’t find the exact quote since I listened to the audio book in the car and this isn’t searchable yet): “when something’s free, people tend to treat it like it’s indefinitely available”.

6.5 free CPU hours already allow for a lot of work.  I handle around 10.000 visitors a day, a lot of URL Fetches and many image transformations and only now and then I need more than this.

Startup time can be an issue, so I removed all unneeded jars from WEB-INF/lib and did some lazy loading.  This startup time is however mainly an issue during lower traffic times because Appengine stops and starts instances according to the traffic.  A visitor who hits a just starting app needs to wait longer and sometimes gets an error page.  Once your app is up and handles a steady amount of traffic, the server instances seem to stay up.  You can monitor this in the logs by using a ServletContextListener and log the event in the contextInitialized() and contextDestroyed() methods.

The task queues are really useful to do work asynchrously, like cleaning up the datastore (remove all thumbnails older than 30 days) or executing long running cron jobs.  Requests called by the task queue provide some headers that are useful to retry a task only for 3 times. I check this header in the catch block and when it is equal to 3, I don’t throw an exception anymore so the task is removed from the queue.

There are workarounds around the 30 second execution limit.  My workaround is to do a small amount of work in a Servlet (Spring Controller) and then add the same url with some other parameters (like a database cursor) to a task queue.

You don’t need a database for everything.  I moved some tables that would never change to my Spring config XML which avoids datastore lookups.

Your application needs to be able to handle sudden shutdowns and startups without error.  A user may arrive on a different server instance for every request.  I decided not to use HttpSessions (I almost never use this).

The URLFetchService caches responses by default.  You need to add your own no-cache request headers to get fresh results.

Subscribe to the Appengine downtime notification feed, you can also check the system status.  According to Murphy’s law, the first week when I ran on Appengine the whole thing that’s not supposed to die went down.  Google did provide a detailed post mortem explaining everything.  As long as they’re the ones who need to solve the infrastructure problems and not me, I’m happy with that.  I’m modest enough to know I couldn’t possible match their expertise.

It is possible to set up multiple custom domains, so you’re not stuck with myapp.appspot.com.  I also use 4 hostnames for thumbs, like thumbs1.zamtam.com, thumbs2.zamtam.com, … and a hash on the filename to determine which hostname should server the image.

I created a small java class AppengineUtils.java with some useful methods, feel free to use it.  I add the app version to my javascript file so this has a different url for each time I deploy a new version and the cache headers for this url can be set to a much longer time.  I check if I run in the development server to show some buttons in the html that don’t show up in the production version.

Improvements

The dashboard resets every morning at 9 AM CET.  There is no way to see the quota details for the previous days.

The time mentioned in the logs is confusing since it is not my local time.  An option in the Appengine settings to set the local time would be handy.

The blobstore (still in beta) misses some features, like an easy way to store data fetched with the UrlFetchService to the blobstore.  Luckily my url fetches are smaller than 1 MB so I can store them in the datastore.

The Google Accounts integration is sometimes confusing.  I use Appengine from my Google Apps domain (onthoo.com) but my site runs on different hostnames (koopjeszoeker.be, zamtam.fr, …).  So I needed to add (verify) these domains to my Google Apps Domain.  This part succeeded.  The problem is that I want to send an e-mail from the Mail API, but this service only allows outgoing mails from accounts that are developers for the app.  I can’t seem to add a developer who has an e-mail address like noreply at zamtam.com (an extra domain for my onthoo.com Google Apps domain) instead of noreply at onthoo.com.  I get the developer confirmation e-mail, but the link goes through a series of redirects to end in an error page.  I think my whole Appengine setup is a bit messed up since I currently have 9 apps deployed and it still shows I have 4 remaining (you can have maximum 10 apps).  It can have something to do with the fact that I have a Google Apps account and a Google Account with the same e-mail address.  I have to be careful to log in through https://appengine.google.com/a/<YOURDOMAIN.COM>/ instead of https://appengine.google.com .

The URLFetchService is limited to 10 asynchronous fetches at a time, while I need 12 at the moment.  An increase would be nice, although I know my case is probably an exceptional one.

The 30 active dynamic request limit is for me sometimes an issue, since I use the image api to generate thumbs on the fly, which takes a bit longer (fetch the image url, resize it, store it in the datastore and return it).  Since I’m using different hostnames for the thumbs (like thumbs1.zamtam.com, thumbs2.zamtam.com, …) I get up to 10 requests at a time for a page.  You see the problem when I have 3 users requesting a page at the same time… I cache the thumbs so they’re only generated once, but this doesn’t handle all the cases.  This is something I need to investigate further and maybe I should ask for an increase?

‘Naked domains’ are not supported anymore, so using zamtam.co.uk for example isn’t possible.  This makes the DNS setup a bit more complex.

Conclusion

A lot of exciting things can be done with Appengine. Especially when you run a website instead of long-running batch operations, Appengine can turn out to be a lot cheaper than Amazon EC2.  While EC2 allows you to do much more and in the way you prefer, Appengine pushes a bit to do it their way which makes it easier for you.  With Appengine, you also don’t need to think about scaling MySQL, load-balancing Apache or updating Linux.

One benefit can’t be stressed enough: you don’t need to plan your server capacity beforehand since Appengine does this automatically.  Also, deploying a new version is easy: upload it, test it and when ready, switch the default version to the new version.  No downtime, no worries (you can always go back to the previous version if something shows up later with the new version).

How to lower the load average on a server with more than 50% in 10 seconds

Friday, May 9th, 2008

$uptime
… load average: 0.54, 0.40, 0.36

$sudo vim /etc/fstab
(add the noatime option)

# /dev/sda3
UUID=8623d9e3... / ext3 defaults,errors=remount-ro,noatime 0 1

$sudo mount -a

$uptime
… load average: 0.24, 0.16, 0.17

So far this completely unscientific proof.

More info about the noatime option.

Blocking bad bots

Wednesday, April 9th, 2008

Today I blocked some bad bots that were spidering some of my sites. Most notably Custo, which downloads your entire site.

An interesting solution is posted here (I used the mod_rewrite option). You can test this by changing your user agent in Firefox.

This guy seems to be following bad bots.

I added Java, Nutch, Jakarta, Vagabondo and an empty bot name to the list of bad bots.

New server almost complete

Monday, December 3rd, 2007

I bought (together with my brother) a new server. The old one is definitely ready for retirement: 120.000 visits, 1.600.000 pages and 50.000.000 hits (not counting frequent Google crawls, integration with SMS services and Nieuwsblad.be) for pets.be in a month was a bit too much for 1Gb RAM on a hyperthreaded processor which also runs some other websites and now my koopjeszoeker.be site which definitely needs more memory and faster disks.

The investment wasn’t small, but should be worth it: 2 servers, each with 2 quad-core cpu’s and 4GB RAM, all in one unit. I ordered the server on a tuesday morning and could pick it up the same evening. 3 weeks without free time later, the server is ready to be shipped from under my bed (the noise!) to the data center. Ubuntu, Varnish, Apache 2, Tomcat, MySQL, Subversion, CVS, Firehol, … all is installed and (a little bit) tested.

Those dreaded “server busy” messages should be gone soon and koopjeszoeker.be will be ready to go out of beta! (Jay!)

Ubuntu or CentOS or …

Thursday, October 25th, 2007

So, if one day I have my new dual quad-core server, what do I install on it? Fedora made maintenance on my current server a bit hard because I had to go through long steps to go from one core to the next every 6 months (and sometimes a trip to Brussels to press the reset button when I messed up).

Ubuntu seems easy to install and has long support for the 6.06 version (till 2011).

On the other hand, CentOS seems reasonable too, since I know of some bigger companies who use it in production. I personally don’t know any companies running Ubuntu (I’m sure there are).

Has anybody any experience with the Ubuntu server version? I already installed it on an old computer at home, which worked ok, but what with multi-core processors?

What type of new server should I choose?

Thursday, October 18th, 2007

I’m seriously considering to buy a new server. My current server (Pentium 4, 3 Ghz with 1 Gb RAM) is currently a bit too busy to be healthy.

A double dual quad core (2 servers in 1 unit with each 2 quad-core cpu’s, in total 32 Ghz processing power) may be a little bit overkill (4000 €).

I’m however seriously considering a dual quad core setup with 8 Gb RAM which should be enough to handle the load for the next year(s). I looked up some information to find comparisons between a faster single core cpu and a slower dual or quad core cpu. The conclusion was that for desktops a single faster cpu is sometimes better (because most desktop application are not multi-threaded), but for servers that are mostly multi-process systems you get slower response times but also higher throughput. Since the response times are not really the problem, I think the multiple core setup will be the best choice.

I’m still not sure if I should install Xen for virtualization or not. A benefit would be that I can install MySQL on one virtual server and assign it 4 processors for example. Squid, Apache, Tomcat, Postfix, CVS, … can all get their own virtual instance. But wouldn’t such a virtual-server-per-process setup be a bit hard to maintain?

I’m not sure if virtualization would really give me any benefit, besides the fact that I can isolate some processes (like Postfix and CVS) that shouldn’t be affected when the websites are under heavy load. On the other hand, it seems a bit of a waste to reserve one cpu for these processes that really don’t require so much cpu time.

Website hosting

Sunday, March 11th, 2007

After 303 days of uptime, I decided to reboot my onthoo.com server. The load was getting a bit too high, especially since my brothers website pets.be was mentioned in some national newspapers (Het Laatste Nieuws and La Meuse).

This was the last uptime message after a long time without any reboots:

16:22:13 up 303 days, 21:43, 1 user, load average: 0.31, 12.81, 57.29

Memory was constantly at 800 MB used, without any significant processes running. After the reboot, it was only 250 MB…

Nevertheless a good uptime for this server, which has to handle quite a load these days.

I still have to figure out what exactly went wrong, since the server didn’t respond to http, ssh nor smtp. After some hours, everything came up like nothing had happened. In the Apache logs I found a lot of OutOfMemory errors, maybe the server was just constantly swapping without any time left for handling connections.

I tweaked some Apache parameters, but apparently this wasn’t enough. If anyone knows of a way how to prevent Apache from taking too much memory, please let me know!

Fedora Core 5

Friday, May 5th, 2006

I upgraded my server to Fedora Core 5 with the aid of the excellent guide on http://www.brandonhutchinson.com.

I already used this guide to upgrade from FC2 to FC3 and to FC4. Then I had problems because of an installation problem that I got from the beginning: my /boot directory was differently mapped at boot time then at runtime of the OS (because of the RAID). So although I updated the kernel in the /boot dir, it wasn’t seen at startup and gave compatibility problems (it read the /boot directory from the other disk in the RAID).

The upgrade went fine, but I got a lot of config files that are saved as .rpmnew and that I am now trying to set these correctly again. But hey, if you really want Apache 2.2 and MySQL 5 (like me), it’s normal to expect some work…

Open relay

Thursday, January 20th, 2005

Today I got an e-mail from my daily logwatch that I could not process the log files because they were to big. A bit surprised about this sudden load on my server, I took a look at the files and discovered that my mail.log was 300 MB… Apparently a spammer had found a way to abuse my server to send spam.

I tested my server with various sites to see if it was an open relay, but all these tests failed. I used http://abuse.net/relay.html, but this isn’t a complete test. A better one is http://www.ordb.org/submit/, but this one will record your server in its database when your server is an open relay. This database is used by spam filters and mail servers to reject incoming mail.
You can use this ordb.org site if you add the following to postfix main.cf:
smtpd_client_restrictions = reject_rbl_client relays.ordb.org

Since I could not find how the spammer was abusing my server, I blocked the ip responsible for sending the e-mails by following this post: http://www.linuxquestions.org/questions/history/277040

iptables -I INPUT -s 83.217.36.171/255.255.255.255 -j DROP

I also set this parameter in /etc/postfix/main.cf, since I only use my server to send mail from within squirrelmail (= webmail) or from a script in a cgi-bin dir:

mynetworks_style = host

I think the problem is related to a cgi script that is used by one of the sites that I host and that sends e-mail. I renamed the script to make sure this wouldn’t happen again (my apache logs showed 404 errors from spiders that look for all kinds of cgi programs, probably in order to abuse them). I will also investigate how this cgi program can be abused and if there is an update available for it.

The damage isn’t that great, since I think approximately 200.000 messages were put in the mail queue, but I couldn’t find one that wasn’t rejected by the receiving server (my spamassassin filter had already marked them as spam before sending them out).

Anyway, I feel like being robbed…

Server time accuracy

Saturday, November 6th, 2004

I finally managed to set my servers clock right automatically with ntpdate. I did this by following this guide. A list of publicly available servers is available here.