Monday, February 15, 2010

Coming from PHP: Share Something

This is the first in a series of posts I'd like to do about Python from the perspective of someone coming from PHP development. Others have certainly posted articles on similar topics; heck, there's even a fantastic site dedicated to providing the Python equivalent of PHP functionality.  While I'm sure that I'll talk a bit about some of the building-blocks in the Python language, these posts will focus on language features, interpreter implementations, and deployment platforms that lend to Python's use in large-scale applications.

What do we mean by large-scale applications?

Well, I don't know who "we" is, but I mean applications big enough to require thinking about how the application is architected.  Of course, it's more than just thinking about how the application will work, but also how it will be tested, how it will be maintained, how it will grow, how security policies will be implemented, and how this can all be done as efficiently (and non-repetitively) as possible.  These concerns typically push developers in the direction of an existing framework or into the typically-under-estimated effort of writing their own.

Web application frameworks traditionally start with a high-level look at how requests are handled by the server and turn that into abstraction points.  Typically this ends up in one of the many interpretations of Model-View-Controller and the general phases in the processing look something like this:
  1. Incoming request is dispatched (maybe via mod_rewrite) to a single handler script.  (FrontController pattern)
  2. A routing sub-system looks at the request (usually the requested path) and determines what piece of server-side code should handle that.
  3. The request is probably further processed for things like authentication requirements and then (if other checks pass) handed off to the server-side processing code (sometimes called an Action sometimes the Controller sometimes a View).
  4. The processing code will perform the "meat" of the processing (a typical application will probably query the database, for example) and produce some sort of response that should get sent back to the client.
  5. Typically there is a final phase where a more abstract response is encoded into the format that the client expects (e.g. JSON or XML); alternatively, for more traditional HTML applications, the response data may get passed as the context for a template.
There's a lot of boilerplate code there and a lot of resources that need to get loaded to process a request -- routing, authentication & session management, logging, business logic, model, template rendering, encoding, etc.  Even when not doing any work, the typical framework application will result in the loading of scores of classes, connecting to the database, opening file handles for logging.  Heaven forbid your webapp needs to do anything like open a socket connection.

And this is why I think that PHP's share-nothing architecture is really a pretty dubious "feature".  Practically, it just means that all those resources that have to be setup for your framework have to be setup with every single request.  Let's be honest here, this is not a feature; this is is a pretty severe limitation.  This is doublespeak to turn fundamental problems like "not thread-safe" and "leaks memory like a wet paper bag" into features.

Now, there is a real shared-nothing architecture that describes an approach to develop scalable & concurrent software.  This really has very little to do with how this term has been used to describe PHP's architecture.  Furthermore, there's nothing about PHP that makes it uniquely able to "support" [its understanding of] share-nothing architecture.  It simply doesn't have the language or interpreter platform support to do anything else.  It's like saying that a single-speed bicycle is better than a geared bicycle because it's easier to understand.

So, enter Python.  Python is certainly not unique in its deployment paradigm, but it does provide a healthy contrast to PHP.  To the point here, In Python you can share stuff.  So, if you are using Apache with mod_wsgi (a popular Python hosting option, especially for frameworks), you can run Apache in multi-threaded worker MPM and all those overhead resources only need to be initialized once per process -- not per request.  Of course, if you wanted to, you could make it behave like PHP, but no one would do that; that'd be inefficient.

So what is the price to pay for this sharing?  Well, there is some additional complexity.  If you are running a multi-threaded environment (e.g. Apache worker MPM or another multi-threaded Python server) you do need to make sure that those resources (db connections, log handles, etc.) are thread-safe.  Typically in Python they are, but one does need to understand what that means.  So it will demand a little more, but for those of you developing full-stack frameworks in PHP, you know that you've already left the Green Zone.

So, while we're all excited about applying the DRY mantra to our software design, I think it's worth stopping to consider whether maybe there's a similar principle that could be applied to the server architecture.  Sharing resources between requests is a powerful feature.  In single-process (multi-threaded) systems is makes it possible to share state without persisting to an external store; in multi-process & mulit-threaded systems, it provides a huge efficiency improvement by handling the parsing and app setup / resource initialization only once per process.

Share.  You'll feel better.

Of course, sometimes simpler is better. If you don't need an application framework, then you probably aren't concerned with eliminating repetitive overhead code.  To go back to our bicycle analogy, I actually do ride a single-speed bicycle to work because my commute is relatively flat and fewer mechanical parts fewer parts to replace.