For anyone that’s been doing software development for any reasonable length of time – especially for web applications – then you’re likely familiar with Local (or Development), Staging (or Test), and the Production environments.

If not:

  • Local or Development refers to the machine on which you’re actually building the product.
  • Staging or Test refers to the server designed to represent Production, though is only accessible by developers, testers, clients, and perhaps some of the end users to evaluate features prior to the official rollout.
  • Production is the live version of the site. No development occurs on this server.

Most developers who are in the business or closely working with their client follow this particular setup.

In the past couple of months, there have been a few times when a single production rollout has fallen short and ended up either revealing bugs that were not caught in Staging or that did not hold up under Production-level loads.

As frustrating as that can be, I’ve ended up using a sort of two-phase Production deployment plan to help mitigate this.

The Typical Setup

Production Deployment Plan - A Typical Setup

A typical setup for a Development / Staging / Production Environment.

The usual setup for anyone who’s working with Development, Staging, Production, and source control may look something like the diagram above.

That is, the workflow is likely to consist of the following:

  • All work is done on Development and is then committed to Source Control.
  • After a certain milestone is reached, then the version of code is Source Control is deployed to Test. At this point, code may or may not be tagged as a certain version. Personally, I tag these based on the milestone that’s being deployed.
  • Once Test has been approved, source control is tagged as whatever version – say 1.0, 1.1, 2.0, whatever – and then deployed to Live.

This isn’t really anything new but following this process even for the smallest of projects can may dividends in maintenance and making sure that you’re meeting your client’s requirements properly.

On Staging and Replication

But there’s an assumption that’s buried in this particular process and that’s this: That Staging accurately represents what’s in Production.

To phrase it another way: Staging should represent Production as closely as possible so much so, perhaps, that the user can’t even tell the difference in the two environments.

For small-to-mid level sites, this is no problem. Setting up a reasonable number of users and replicating the contents of the Production database isn’t terrible, but if you’re working on a site that has 20,000 hits at minimum per day and has several thousand active users, that can become a bit more of a challenge.

Here’s why:

  • The customer has a set of requirements and expectations for what to add to the site
  • You develop the functionality, deploy it to Staging, s/he signs off, you roll it out to Production
  • Users then begin to contact support about various problems that are showing up. Some with the new feature, some that are collateral damage of the new feature

 

Production Deployment Plan - Oops!

 

A Real World Example

For anyone who’s been working in the space long enough, sometimes this stuff just happens. In fact, it happened in a recent project in which users are able to sign up for a site and, while selecting their username, were free to use any letters and numbers.

Through the use of custom rewrite users, users are able to access their profile by navigating to /user/username.

The regular expression powering the rewrite rule didn’t accurately capture numbers at the end of usernames so some user profiles were redirecting to the next closest match.

This was one of those bugs that slipped by testing and QA and ended up making it to Production. Tracing it down to the regular expression was easy enough, but should this have been a larger problem, the most reasonable thing to do would’ve been to rollback to the last known good version of the codebase.

Granted, that’s part of the reason that we have source control, but it is a bit time consuming.

So what would it look like if we were to invert that process? That is, rather than rolling something out with the ability to rollback the entire codebase, what if we were to simply able to turn new functionality on and off?

A Production Deployment Plan

Rather than launching the latest version of code with the ability to rollback, what about the alternative of launching the latest version of the with the ability to enable and/or disable new functionality?

This isn’t necessarily a new idea (in fact, I’ve used it in previous jobs before), but it saves a lot of time from having to deal with snapshots of code.

Here’s how it works:

  • When you begin working on a new feature, you have a type of flag that can detect which environment the code is running in. That is, the site or application is aware of running in Development, Staging, or Production.
  • The flag can exist in a text file, a database option, or a query string flag.
  • The flag is committed to source control and deployed to each environment. By default, it’s on for Development and Staging but is off for Production.

At this point, you and the client determine when to activate the new feature(s). You then simply toggle the flag to on (clear the cache if need be) and then watch as the new feature is being used.

If a bug arises, you simply toggle the flag again. No mad dash to the server to tweak the file. No taking the site down to revert to a previous version. Simply toggle the flag and clear the cache.

Much easier, isn’t it?

Using a type of flag also offers a level of control such that we could activate it for a subset of users or a type of user so that we have a level of control when it comes to diagnosing issues in the Production environment even when users aren’t able to see it.

A Word About Overhead

There’s one challenge that comes with doing this: Keeping track of multiple flags.

As a codebase grows, there are bound to be multiple flags introduced into the codebase which can quickly become waste if not managed properly as each new phase or milestone is developed.

The more waste that’s in the system, the more potential there is for something to go wrong when trying to clean up flags. As such, I like to try to keep the number of active flags to no more than two.

Once a feature has proven itself, I’ll remove the flag and then redeploy the codebase.

Anyway, all of this will look a bit different depending on your process, configuration, and clients, but the bottom line is that there is a stronger alternative than simply performing code rollbacks.

Though it’s not without its overhead, I’ve found that this has been much less of a headache than dealing with the alternative and it results in greater peace of mind when doing rollouts especially for large sites.

With that said, I’d love to hear what you guys have used in your own projects. Even if you don’t dig this idea, I’m always up for hearing what alternatives exist, so feel free to leave a comment.

Category:
Articles

Join the conversation! 9 Comments

  1. Hey Tom,

    Thanks for writing this article. I’ve been “wanting” to set up my own development platform for a little while, however the want has yet to become a “need”. I like the idea of the 3+ phases (dev, staging/testing, source control, live) but need some more knowledge to get it right. And now you’ve added the totally groovy concept of flagging changes and being able to control just their rollouts. Any quality resources you might point one seeking such knowledge? Much appreciated!

    • It’s one of those things that, I think, you really just have to start doing rather than waiting for it to be a need, you know?

      There are a couple of tools that I can recommend. I think I’ll try to cover those in an upcoming blog post.

  2. For releasing features into production we’ve used that similar approach extensively in my company where we have global sites spanning many datacenters. We call it “Flipping the BRS” or Big Red Switch.

    • Yeah – we did the exact same thing at my previous job.

      In the context of contract projects – at least for me – the principle is still basically the same, but the implementation varies now based on the size of the site, type of server, and all that fun stuff.

  3. “Even the smallest of projects” – so true…

    I recently met people who just “do it in production” and hope it all goes well.

    I was like that, but learned my lesson painfully (MSN 2.5, December 18, 1998, never forgot that).

    Now, my question is this, I run everything local (updated host file, apache virtual host) and when
    I do things, they have my domainname.dev (locally), what is the best process of pushing a local site to a production domain so that all uploads, links, etc… are updated to the new domain?

    • I actually normally do deployments from my local to the staging site and then will either do one of two things based on the project:

      If there’s no data in the staging environment that’s relevant to the production environment, then I’ll deploy from staging to production
      If the staging environment has gotten cluttered, then I’ll deploy from the latest tag in git to production either via FTP or even a git deployment (if a server will allow it)

      That said, I try to keep things really simple for uploading. Things such as configuration files or host files, etc, I don’t bother pushing. I configure each of my environments so they mimic each other as close as possible so all I really have to push is the “app layer” or the code layer.

      Hopefully this makes sense.

      Of course, now I’m curious about what you did at MSN. . . :)

      • That makes sense… I think I’m still wondering about details, because if you are working on a dev environment the domain is not the same as production (unless you spoof it) and if you make modifications using the dashboard, how do they get pushed, specifically image links. FYI, I’ll be at the meetup tomorrow and tell you the story then. :-) Thanks!

        • WordPress has API methods for things like:

          get_template_directory_uri()
          home_url()
          And so on…

          That make it really easy to to work with certain assets programmatically.

          If you’re actually going to be deploying the database to the server, there’s only two values you’d really need to change (if I recall off the top of my head correctly): Take a look at the wp_options table and you’ll see where the Site URL’s are stored.

          Change those in production (and keep your localhost config mapped to your domain – or even use a plugin for mass replacement before deployment) and you’re money.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.