What do we mean by large-scale applications?
Well, I don't know who "we" is, but I mean applications big enough to require thinking about how the application is architected. Of course, it's more than just thinking about how the application will work, but also how it will be tested, how it will be maintained, how it will grow, how security policies will be implemented, and how this can all be done as efficiently (and non-repetitively) as possible. These concerns typically push developers in the direction of an existing framework or into the typically-under-estimated effort of writing their own.
Web application frameworks traditionally start with a high-level look at how requests are handled by the server and turn that into abstraction points. Typically this ends up in one of the many interpretations of Model-View-Controller and the general phases in the processing look something like this:
- Incoming request is dispatched (maybe via mod_rewrite) to a single handler script. (FrontController pattern)
- A routing sub-system looks at the request (usually the requested path) and determines what piece of server-side code should handle that.
- The request is probably further processed for things like authentication requirements and then (if other checks pass) handed off to the server-side processing code (sometimes called an Action sometimes the Controller sometimes a View).
- The processing code will perform the "meat" of the processing (a typical application will probably query the database, for example) and produce some sort of response that should get sent back to the client.
- Typically there is a final phase where a more abstract response is encoded into the format that the client expects (e.g. JSON or XML); alternatively, for more traditional HTML applications, the response data may get passed as the context for a template.
And this is why I think that PHP's share-nothing architecture is really a pretty dubious "feature". Practically, it just means that all those resources that have to be setup for your framework have to be setup with every single request. Let's be honest here, this is not a feature; this is is a pretty severe limitation. This is doublespeak to turn fundamental problems like "not thread-safe" and "leaks memory like a wet paper bag" into features.
Now, there is a real shared-nothing architecture that describes an approach to develop scalable & concurrent software. This really has very little to do with how this term has been used to describe PHP's architecture. Furthermore, there's nothing about PHP that makes it uniquely able to "support" [its understanding of] share-nothing architecture. It simply doesn't have the language or interpreter platform support to do anything else. It's like saying that a single-speed bicycle is better than a geared bicycle because it's easier to understand.
So, enter Python. Python is certainly not unique in its deployment paradigm, but it does provide a healthy contrast to PHP. To the point here, In Python you can share stuff. So, if you are using Apache with mod_wsgi (a popular Python hosting option, especially for frameworks), you can run Apache in multi-threaded worker MPM and all those overhead resources only need to be initialized once per process -- not per request. Of course, if you wanted to, you could make it behave like PHP, but no one would do that; that'd be inefficient.
So what is the price to pay for this sharing? Well, there is some additional complexity. If you are running a multi-threaded environment (e.g. Apache worker MPM or another multi-threaded Python server) you do need to make sure that those resources (db connections, log handles, etc.) are thread-safe. Typically in Python they are, but one does need to understand what that means. So it will demand a little more, but for those of you developing full-stack frameworks in PHP, you know that you've already left the Green Zone.
So, while we're all excited about applying the DRY mantra to our software design, I think it's worth stopping to consider whether maybe there's a similar principle that could be applied to the server architecture. Sharing resources between requests is a powerful feature. In single-process (multi-threaded) systems is makes it possible to share state without persisting to an external store; in multi-process & mulit-threaded systems, it provides a huge efficiency improvement by handling the parsing and app setup / resource initialization only once per process.
Share. You'll feel better.
Of course, sometimes simpler is better. If you don't need an application framework, then you probably aren't concerned with eliminating repetitive overhead code. To go back to our bicycle analogy, I actually do ride a single-speed bicycle to work because my commute is relatively flat and fewer mechanical parts fewer parts to replace.
PHP has a mod_wsgi option as well: there has been a php-fpm (fastcgi process manager) patch around for years, and since php 5.3 it has been integrated into php. The speedup this brings is quite significant and I haven't found anything about threading issues like the way mod_php is discouraged in an apache-worker environment.
ReplyDeleteI have both apache/mod_php and nginx/php-fpm running in production and both are perfectly stable. I will be investigating running apache-worker/php-fpm because I need things like .htaccess support that nginx doesn't offer.