PHP -- Good or Bad?

It all starts with Tim Bray's little rant on PHP a few days ago. Tim can't stand PHP, because

...all the PHP code I've seen in that experience has been messy, unmaintainable crap. Spaghetti SQL wrapped in spaghetti PHP wrapped in spaghetti HTML, replicated in slightly-varying form in dozens of places.

I have seen some well designed PHP code, but generally Tim's observation stands valid.

The problem is more than wild spread of spaghetti PHP code found in many open source applications. Programs are bounded to be a mess if they are not designed, but were hacked together by non-programmers trying to quickly (and often temporarily) fill his/her needs. Nothing wrong with that -- you can write spaghetti code in every language of your choice (debugging spaghetti Python is part of my daily job). In fact it might be a compliment to PHP as it is just so easy to get up and running. Even a non-programmer can add a bit of "dynamics" to his/her own website, without begging/paying the professionals.

Things only get messy when your temporary "Hello World" application starts to need database support, need to manage large XML documents, need to talk in different charsets, and need to scale to hundreds of requests per second.

However, what I see the fundamental problem of PHP is not that there are zillions lines of messy code out there, but the language itself. Scanning through Tim's comment list, two other posts caught my eyes.

First of all, Jonas Maurus wrote "PHP sucks". It's a bit technical, but I very much agree with what he has said. There are simply too many "warts" in PHP-land that can catch even a seasoned developer by surprise. For example the pass by reference difference between PHP4 and PHP5, and you usually need to use === (3 equal signs) to correctly validate the value as boolean false.

Another nice write up about PHP language comes from Aristotle Pagaltzis -- "Why PHP is good but bad". I especially agree with the point "The easy, obvious way to do things is often the incorrect one".

There are lots of tutorials which will either not tell you to quote user input before interpolating it into SQL statements at all, or tell you to use addslashes for the purpose. In either case you are open to injection attacks -- either gaping wide or just wide...

What annoys me is, there are way too many PHP programmers (who probably have no other SQL experience) automatically assume 'addslashes' with escaping in SQL string literal, and subsequently think that they are safe from SQL injection by adding slashes everywhere in their SQL concatenations. Related "magic quote" that gets applied to all incoming GET/POST/Cookie data also does more voodoo than good.

And Aristotle concluded the problem with PHP:

... the morass is simply the result of the language being a templating system that grew too big for its breeches. I don.t believe the problems can be corrected in any sensible fashion; PHP will always be a templating system, however much it may be straining against its clothes.

Yes -- it was initially designed as a template language that grew out from Perl CGI scripts. New functions were added onto a flat name space so they can be easily called from inside the templates. It is working hard to become a full object oriented language itself, but the baggage was still very visible.

On the other hand, you have Harry Fuecks having his pro-PHP rant. "Shared nothing" has been one of his main argument why PHP is scalable. And it works -- with Apache and mod_php -- as the web development platform it was designed to be.

"Shared nothing" being scalable is quite arguable, and I believe it really depends on the application. Your clusters of Apache/PHP nodes pushes all states into database assuming that the DB is going to scale -- that is a very big assumption. It only really works with applications with far more reading than writing, where you can easily set up clusters of slave DBs. But heh, it covers most of the "Web 2.0" style of applications and I guess that's what they care.

Moreover, "shared nothing" is a design philosophy that is not unique to PHP. I can have my JSP or Django with FastCGI, etc application that does exactly the same thing. Now, with other applications where a per-process in-memory cache would be very useful, what option do I have with PHP? Most caching options I have seen rely on external processes or external storage.

The rest of Harry's article talks about how PHP plays catching up -- generic database support (I was using Perl DBI in 1997), SPL (how long have we had STL?), XML (finally!), Unicode (finally 2!), etc.

Language wise, I do not think PHP's Schwarz is bigger. It is getting there, but I'll rather code in Python or Ruby than PHP. The only redeeming value I see in PHP is -- it works (most of the time). Thanks to mod_php that gets installed almost everywhere (and becoming a resource hog with Apache's prefork MPM), your PHP application is most likely to run in the $10/month budget shared hosts.

Yahoo's PHP development centre has just gone live. It is not hard to see that PHP has got Yahoo as its backing. Well, they have Rasmus -- maybe that's why. At the same time, Google has got Guido. Interesting competition ahead :)