Nodejs Production Deployment

Recently, we’ve embarked on a journey down the nodejs path.

One of our new projects is written entirely in nodejs.  It was fun learning nodejs and it was a good fit for the project.  In the end, we achieved a very stable and very fast product that I think will serve us well.

However, it wasn’t all sunshine and roses.  The biggest hurdle we had to overcome was how to deploy our product to our production environment and maintain the high service standards that we require of all our products.  Nodejs’s biggest issues for production deployments are utilizing multi-core hardware resources and the fact that the application is the service.

The Application is the Service

This is an issue for production deployments because it means that the developer is developing the service that runs all the requests not just the script that serves a single request.  That means that a mistake on the developer’s part can take the entire service down rather than just the request where the mistake occurs.  There’s not a lot that can be done about this (as it’s part of the design of nodejs), but it can be mitigated by running multiple instances and load balancing them behind a Virtual IP address (VIP).

Utilizing Multi-Core Hardware

As nodejs is a single threaded application it cannot utilize the computing resources available to multi-core systems.  It is necessary to do this as most nodejs http frameworks (even express) load static content with blocking calls to the file system which effectively kills your page load times on a website.  This is generally solved by running multiple instances of the application on each server (i.e. via supervisor), but this causes other issues such as: binding to the same port(s) and what happens when you use sticky sessions with your load balancer (hint: you’re back to using one instance to load everything again).

Our Search

Coming from a PHP background we naturally went looking for nodejs’s equivalent of php-fpm (PHP’s Fast-cgi Process Manager).  For PHP, php-fpm solves all of the issues above so we were hopeful that there would be something similar for nodejs (especially since there are some very large organizations using it).  Our hopes were dashed.  There was nothing like php-fpm for nodejs.  Oh there were nodejs process managers and clustering modules, but they all required us to implement something per application or were abandoned or lacking other requirements.  This really seemed crazy.  How could it be that there was no way to easily provide a single point of entry to a pool of processes running our application?  We did some research and found that nodejs does support this type of access pattern with the built in cluster module, but it is just the foundation and still needs a house.

Our Solution

We wrote our own nodejs process manager based on the nodejs cluster module with the goal of being the nodejs equivalent of php-fpm.

Introducing node-pm, a simple and easy way to run nodejs applications in production environments with everything you’d expect from a product running in a high traffic web environment.  It’s released under the MIT license and you can check out the source on github: http://github.com/sazze/node-pm

It’s really easy to get started.  Just execute the following two commands on the command line:

npm install -g node-pm
node-pm app.js

The last command above will run your application (app.js) using node-pm.  The effect is that node-pm will spawn 1 process per CPU on your system that all listen on the same address and port to serve requests.  node-pm will also enforce a 30 minute maximum lifetime for each process and gracefully restart them to help mitigate the effects of memory leaks.  These (and many other) settings are fully configurable and you can tweak them to match your application and environment.  Check out the docs (or run node-pm with no arguments) to see the full list of options.

We are running the latest version of node-pm in production (0.8.0 as of this writing).

Please feel free to check out node-pm.  We’d love your feedback.

Storage in Cloud Architectures

Storage is probably one the most outdated technologies that is still essential to everything we do.  As everything else in the tech universe gets faster, more parallel, and cheaper, storage just gets bigger and cheaper.  The exception to this is SSD.  I long for the day when this new type of storage will be viable to completely replace all other existing hard drives…….anyways, back to point.

As a company who is deploying a private cloud to run our services, capacity is really a secondary concern.  We don’t need 50PB of storage space to host virtual machine images (well, not yet anyway).  A few TB’s gets us a good number of machines.  What we care about is performance.

From personal experience, I’ve found that we run out of IO performance long before we run out of space when it comes to hosting virtual machine images.  This is a huge problem for cloud architectures because the storage layer needs to be global and shared to take full advantage of the model.  We need a single storage pool that not only has the capacity to hold all our virtual machine instances, but most importantly, has the performance to run them all without negatively affecting any other virtual machine.

This means two things: 1) it must allow parallel access to the file system, and 2) it must be scalable!  When we add nodes to the pool, it must not only scale in capacity, but also in IO performance.

When I talk to storage vendors, they still don’t get it.  They are still thinking of storage in terms of size instead of performance.  They want to know how many files we need to store or how many TB/PB of space we require.   The question these vendors should be asking is how many machines do we need to host off their storage platform?  And, what is the IO performance we expect?  I could easily host 20 virtual machine images on the 2TB NAS they want sell, but the disks/controllers just aren’t up to that task (a 2TB NAS will choke and die long before the 20 machine mark, unless I’m willing to pay a ridiculous amount of money for it).  The problem is that performance has taken a back seat to capacity and that needs to stop.

To this end, I’ve been taking a long hard look at GlusterFS.  This open source file system, built to remove the IO bottle neck from super computing clusters, seems to be a very appropriate solution to the cloud storage issue.  It is a parallel, distributed file system that scales in both capacity and performance.

We will be deploying GlusterFS to store and host virtual machine images in our production environment this year.  I’ll let you know how it goes.

TCP Window Scaling

Recently, we’ve been having a small (but significantly too big) amount of users who have been having issues connecting to our websites.  This has been a very frustrating problem as there was no pattern of location, browser, OS, ISP, or any other normal factor related to connection issues.

In the end, it ended up being a problem with TCP window scaling.  If you don’t know what this is, don’t worry, it is very technical and I’m not going to go into details here ;)  Basically, this setting is turned on by default in all modern Linux/Unix kernels and makes your internet connection faster (when it works).  Unfortunately, there is equipment out there on the internet that does not handle TCP window scaling correctly and if you are unlucky enough to have it between your computer and the website you are trying to connect to, then you will experience intermittent issues accessing the site.

Now, this is all very well documented and googling it will present a wealth of knowledge about how to turn off TCP window scaling on your computer so you don’t have these problems anymore.  But what about the servers hosting these websites?  We can’t tell all our users to turn off TCP window scaling on their computers.  Shouldn’t there be something we can do on our end to prevent this problem from happening?  As it turns out, there is.  Turn off TCP window scaling and TCP timestamps on all our public facing equipment.  Below is the code to do that on Linux (RedHat flavors):

sysctl net.ipv4.tcp_window_scaling=0
sysctl net.ipv4.tcp_timestamps=0

Turning off TCP timestamps is the part that is missing from all the online information and what is absolutely essential for fixing this issue on the server side (it’s not necessary on the client side).

Event Driven Architecture

Recently, I’ve been reading about Event Driven Architecture (EDA).  This is really exciting stuff and I’m convinced that it will be the future of the data center.

Combine this with virtualization and configuration management tools (like Chef) and EDA provides the mechanism for intelligent architecture that is automated and flexible.  Imagine an infrastructure that can not only alert you when a machine fails, but know what it means and trigger the actions necessary to fix it.  The problem could be fixed automatically before the notification email is delivered to your inbox!  This is the first stepping stone to true artificial intelligence at the infrastructure level.  Just as event driven programming transformed software applications, EDA will transform the data center!

I’ve incorporated EDA into my vision for our infrastructure and determined the tools necessary to start building the foundation of our EDA.  The first thing you need to build an EDA is a message bus that is accessible across the entire infrastructure.  RabbitMQ seems to be a great fit for this part of the EDA model.  It is a redundant, fault tolerant, high performance messaging queue.  It is built with the AMQP messaging protocol in mind and is ideal for the system wide messaging infrastructure that my vision requires.

Once the messaging queue (or message bus) is in place, we can proceed to the next step in implementing our EDA infrastructure.  Stay tuned!

Stocu.com Closed Beta Invite

Last week I announced the launch of the Stocu.com closed beta and promised to post my invite link.

Well, promise kept!  Below is my invite link to the Stocu.com closed beta.  Just click it to sign up.

Hurry, only the first 8 people to sign up with this link will get in! (after that, I’m all out of invites)

Stocu.com Closed Beta

Yesterday afternoon we launched Stocu.com in closed beta.

This is a new project that we (Sazze, Inc.) are incubating and has been in the works for a couple months.

This is the first major new project that has completely leveraged our frameworks and infrastructure platforms that we have been developing over the past 2 years.  It was a complete success!  Normally, a project like this would have taken months to develop, but we went from concept to closed beta in 3 weeks!

For those of you who would like to know what Stocu.com is all about, here’s an official description from our copy experts:

Stocu.com started out as an idea for a “stock picking game for fame,” and quickly grew into the platform for a social network where users are able to predict where stocks will close in either a day or a week, and gain market insight from comments and predictions made by their fellow users.

If that sounds interesting to you, there are two options:

  1. Head over to Stocu.com and request an invite
  2. Keep checking out my blog (I will post an invite code later this week)

Win a Brand New Car from dealspl.us!

I hinted about it in the last post, and now it is here!

We are giving away a brand new 2010 Ford Focus!  It’s official, we are insane……insane about deals! :D

Anyway, go here to find out the rules and enter the giveaway:

Win a Brand New Car from dealspl.us!

Win a Brand New Car from dealspl.us!

Win a Brand New Car from dealspl.us!

dealspl.us – Looking Better than Ever

dealspl.us has a brand new look!

We’ve been working hard for a while now on a complete redesign of dealspl.us and today is the magic day!  Head on over there and check it out.  We want to know what you think, so feel free to comment here, or, better yet, comment on dealspl.us – Looking Better than Ever

Also, there is going to be a really awesome giveaway in celebration of this redesign, so stay tuned…..

dealspl.us on your phone

So, it’s been about a year since my last post…….shame on me.

I could give the standard excuses about being too busy, etc, etc, but I wont.  Instead, I’m going to dust this blog off and breathe some more life into it with this announcement:

dealspl.us has launched a version of it’s site specifically for the mobile phone!  To check it out, point your mobile phone browser to: m.dealspl.us

This site is specifically tailored for the mobile phone screen (touch screens in particular).  You can view all the great deals and coupons on dealspl.us quickly and easily on your phone when you’re out and about.

We’ve even had reports from some of our users that retailers have accepted coupons on their phones instead of paper coupons!  This is really exciting as we’re always trying to improve the way you can save money and make better purchasing decisions.

So check it out.  You just might save some money ;)

Flast

I’m excited to announce the official release of Flast! (my very first open source project!)

For those of you who don’t know (everyone), Flast is an open source framework for PHP version 5.3.  It is focused on performance (fast) and removing restrictions on developers (flexible).

You can check out Flast here: http://sourceforge.net/projects/flast

Your feedback is much appreciated.

Here’s a little history about why I decided to create Flast…..

While working on DealsPlus and Sazze, we evaluated many PHP frameworks, but decided to create our own because they were all to slow and/or forced us to use a particular coding methodology (i.e. MVC).  After successfully creating a very useful framework that actually improves performance, I wanted to give back to the open  source community by using my experience (and the awesome new features in PHP 5.3) to create a framework that gives the developer complete control over performance and functionality.

Right now, Flast is in a very early pre-Alpha phase, but it should be useful by the time a production ready PHP 5.3 is released.

Follow

Get every new post delivered to your Inbox.