Open source web servers are still buggy

Last night

Upgraded my test box's lighty to the latest 1.4.10 via Gentoo portage (I have lighttpd keyword masked). And then suddenly none of my CGI works. It just kept on complaining:

2006-02-16 23:24:18: (mod_cgi.c.1186) cgi died ?
2006-02-16 23:27:07: (mod_cgi.c.1186) cgi died ?
2006-02-16 23:27:14: (mod_cgi.c.1186) cgi died ?
2006-02-16 23:32:19: (mod_cgi.c.1186) cgi died ?

Lighttpd I couldn't work out why -- 1.4.10 was supposed to have CGI working again (after it is broken in 1.4.9), and I don't even have SCGI module loaded. So instead of doing some coding, I played around with various configuration for an hour, gave up, and re-emerged 1.4.8 which was my last known-working version. CGI is happy again.

(Update: Another way to get 1.4.10 working is by starting mod_fastcgi before mod_cgi.)

What a waste of time...

This morning

Upgraded our work's public webserver this morning (something that I should have done yonks ago) from Apache 2.0.54 to 2.0.55 and because Gentoo changed the Apache configuration path somewhere in the middle, I ended up having to upgrade other things like PHP (which then triggers other upgrades). Lucky with Gentoo you can compile the upgrade into binary packages first without emerging, so that you can do a quick incompatible upgrade. Our public site came back up in under a minute. Not bad.

But that's the end of my good luck.

Apache Suddenly none of our CGIs are working! All there is in the error_log is "Premature end of script headers" with nothing else written to stderr! Putting debugging statements into our CGI shows that they have never been executed, and then I found out that suexec wrapper is spilling dummies.

It appears that suexec compiled by Gentoo has minimum UID set to 1000, but our server was set up in the good old Mandrake days when UID starts from 500 -- no wonder it kept on complaining. Commenting out loading the suexec_module doesn't seem to work so I ended up having to re-emerge Apache again with minimum UID adjusted. Aargh!

Then there are reports saying upload files via Apache reverse proxy to our application server has stopped working. Putting in tcpdump shows that HTTP header got all jumbled up with the actual content, and I couldn't figure out what's going on. Search on Google reveals this mod_proxy + mod_ssl bug. That basically means:

You cannot use Apache 2.0.55 as reverse proxy + SSL handler for your application server!

I thought this usage pattern would be very common but apparently it wasn't discovered when 2.0.55 was released. So again I re-emerged 2.0.54 and everyone is happy again.

What a waste of time... Actually it took me whole morning getting this issue resolved, where initially I thought I could upgrade the web server in 5 minutes.

Verdict

Open source web servers -- more eyeballs looking through the code doesn't mean that it is bug free. Upgrading without testing can mean disaster and waste of time -- don't assume that everything is well tested when a new version is released.

But I'll still pick apache or lighty if I need to choose between them and IIS and ISA.