Category: System Administration


Storage is probably one the most outdated technologies that is still essential to everything we do.  As everything else in the tech universe gets faster, more parallel, and cheaper, storage just gets bigger and cheaper.  The exception to this is SSD.  I long for the day when this new type of storage will be viable to completely replace all other existing hard drives…….anyways, back to point.

As a company who is deploying a private cloud to run our services, capacity is really a secondary concern.  We don’t need 50PB of storage space to host virtual machine images (well, not yet anyway).  A few TB’s gets us a good number of machines.  What we care about is performance.

From personal experience, I’ve found that we run out of IO performance long before we run out of space when it comes to hosting virtual machine images.  This is a huge problem for cloud architectures because the storage layer needs to be global and shared to take full advantage of the model.  We need a single storage pool that not only has the capacity to hold all our virtual machine instances, but most importantly, has the performance to run them all without negatively affecting any other virtual machine.

This means two things: 1) it must allow parallel access to the file system, and 2) it must be scalable!  When we add nodes to the pool, it must not only scale in capacity, but also in IO performance.

When I talk to storage vendors, they still don’t get it.  They are still thinking of storage in terms of size instead of performance.  They want to know how many files we need to store or how many TB/PB of space we require.   The question these vendors should be asking is how many machines do we need to host off their storage platform?  And, what is the IO performance we expect?  I could easily host 20 virtual machine images on the 2TB NAS they want sell, but the disks/controllers just aren’t up to that task (a 2TB NAS will choke and die long before the 20 machine mark, unless I’m willing to pay a ridiculous amount of money for it).  The problem is that performance has taken a back seat to capacity and that needs to stop.

To this end, I’ve been taking a long hard look at GlusterFS.  This open source file system, built to remove the IO bottle neck from super computing clusters, seems to be a very appropriate solution to the cloud storage issue.  It is a parallel, distributed file system that scales in both capacity and performance.

We will be deploying GlusterFS to store and host virtual machine images in our production environment this year.  I’ll let you know how it goes.

TCP Window Scaling

Recently, we’ve been having a small (but significantly too big) amount of users who have been having issues connecting to our websites.  This has been a very frustrating problem as there was no pattern of location, browser, OS, ISP, or any other normal factor related to connection issues.

In the end, it ended up being a problem with TCP window scaling.  If you don’t know what this is, don’t worry, it is very technical and I’m not going to go into details here ;)   Basically, this setting is turned on by default in all modern Linux/Unix kernels and makes your internet connection faster (when it works).  Unfortunately, there is equipment out there on the internet that does not handle TCP window scaling correctly and if you are unlucky enough to have it between your computer and the website you are trying to connect to, then you will experience intermittent issues accessing the site.

Now, this is all very well documented and googling it will present a wealth of knowledge about how to turn off TCP window scaling on your computer so you don’t have these problems anymore.  But what about the servers hosting these websites?  We can’t tell all our users to turn off TCP window scaling on their computers.  Shouldn’t there be something we can do on our end to prevent this problem from happening?  As it turns out, there is.  Turn off TCP window scaling and TCP timestamps on all our public facing equipment.  Below is the code to do that on Linux (RedHat flavors):

sysctl net.ipv4.tcp_window_scaling=0
sysctl net.ipv4.tcp_timestamps=0

Turning off TCP timestamps is the part that is missing from all the online information and what is absolutely essential for fixing this issue on the server side (it’s not necessary on the client side).

Recently, I’ve been reading about Event Driven Architecture (EDA).  This is really exciting stuff and I’m convinced that it will be the future of the data center.

Combine this with virtualization and configuration management tools (like Chef) and EDA provides the mechanism for intelligent architecture that is automated and flexible.  Imagine an infrastructure that can not only alert you when a machine fails, but know what it means and trigger the actions necessary to fix it.  The problem could be fixed automatically before the notification email is delivered to your inbox!  This is the first stepping stone to true artificial intelligence at the infrastructure level.  Just as event driven programming transformed software applications, EDA will transform the data center!

I’ve incorporated EDA into my vision for our infrastructure and determined the tools necessary to start building the foundation of our EDA.  The first thing you need to build an EDA is a message bus that is accessible across the entire infrastructure.  RabbitMQ seems to be a great fit for this part of the EDA model.  It is a redundant, fault tolerant, high performance messaging queue.  It is built with the AMQP messaging protocol in mind and is ideal for the system wide messaging infrastructure that my vision requires.

Once the messaging queue (or message bus) is in place, we can proceed to the next step in implementing our EDA infrastructure.  Stay tuned!

PHP Memcached Manager

We’ve been using memcached on both our sites for a while now to help alleviate database load and speed things up in general.

However, we’ve been lacking a good web-based manager to see the cache status and manually clear the cache.  (I’ve been doing this via telnet on the command line and have been to busy to write my own script…..)

Today, I stumbled accross this gem: http://livebookmark.net/journal/2008/05/21/memcachephp-stats-like-apcphp/

It’s a simple GUI for memcached that is written in PHP and was exactly what I was looking for!

With Citrix releasing their XenServer hypervisor as a free product with virtually no limitations, my thoughts have turned to implementing our own private cloud in our data center.

Just now? yep, cloud computing as a service has been around for a while and marketed to death (in my opinion).  It has big players like Amazon and Google, but it has never been a viable option (and won’t be for a while).  Any serious web2.0 company knows that one of the keys to success is to minimize downtime, and using a cloud service takes control of your uptime away from you.  Granted, not all startups are able to run their own data center, but I don’t think it’s a coincidence that every big name web site runs its own data center(s).  Think about it.  How do you guarantee uptime when you have no control over a critical service?

Anyway, the idea of the cloud is very interesting.  To have a utility like service that allows you to tap computing power without caring about individual servers is almost like computing nirvana.  And the thing that makes it practical?  With Citrix’s move to a free offering, this type of cloud can be easily (and cheaply!) deployed in your own environment that you control (at least according to Citrix).

I can see it now……..web server instances being provisioned and deprovisioned in response to demand, server instances migrating to functioning nodes in response to hardware failures (eliminating service down time), performing hardware maintenance during business hours without affecting services…….whoah there, I think I might be drooling

I really, really want to start trying this out as I’m begining to wrap my mind around a rough idea of how this would integrate into and ultimately transform our environment.  It’s like that vauge shape at the edge of your vision that you know is there, but if you look at it it slips away.  Ah well, I know it will come if I just have a little patience.  In the mean time…….back to my dream

Windows Installer Error 1719

So I just got a new work computer with Windows Vista (64 bit) and have been transferring all my programs and data from my old computer onto the new one.

I’d just about completely finished the transfer when, all of a sudden, I started getting error message 1719 from windows installer whenever I tried to install or uninstall something.

If you do a google search for the error message you get all kinds of results with people saying “do this and do that” and others saying “I tried this and tried that and nothing works”.

Unfortunately, I fell into the later category and nothing I tried seemed to make a difference.  Then, just as I was about to bite the bullet and do a clean install, I stumbled upon a wonderful post that fixed my problem!

http://www.vistax64.com/vista-installation-setup/96680-repair-windows-installer-service-vista-all-versions.html#post470746

The very first post by The_CAT is the one you want.  Some others have added some additional info based on their experience using The_CAT’s instructions if you need it, but the first post was all I needed.

Just in case that link disappears for some odd reason, here’s the instructions from The_CAT that solved my problem:

Here’s the easy steps:
1. Go to a Windows Vista (Any Version) computer that has the Windows
Installer service running correctly and run regedit(Start-Run-Regedit)
2. Go to the location
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\msiserver
3. Right click on this key and select “Export” and save the key to a Flash
Drive or other.
4. Run sfc /scannow on the damaged Vista computer – you won’t need the
install disk as it goes to backup files on your HD. Do not reboot when
complete
5. Double click saved .reg file from working machine and import registry
settings into damaged Vista computer.
6. Now reboot and try to install/uninstall

As I mentioned at the end of my previous post (HA Network Attached Storage (NAS)), the vendor whose product I’m hoping will solve my HA NAS woes put together a demo for me last week and I must say that I was impressed!

Let’s face it, we’re only planning on spending around 6K (total) and they put together a demo for me.  That’s a class act for you.  The only other vendors even remotely interested in providing a demo were the ones charging 75K or more for their “entry-level” product that targets the SMB.  I wonder how well that marketing strategy is working?  Do you know many SMBs looking to spend 75K on a storage system?  I don’t.

Anyways, the demo was good and the servers passed my fail over tests, so we’re purchasing them.  It’s an Active-Backup config, but it will do nicely.  Especially since we’re running our current file server into the ground with our traffic growth.  :)
 
Oh yeah, the vendor is Aberdeen.  Check ‘em out.

HA Network Attached Storage (NAS)

Happy Halloween!

So, I know I promised this towards the beginning of the month, but it’s proven to be trickier than I initially expected (and I expected it to be tricky!)

I’ve been doing a lot of research and a lot of trial and error and I’m pretty sure I know what it takes to configure this correctly, but I’m waiting on some new servers before I set this up for real.

Since we mount our shares using NFS, (see NFS for HA Lighttpd) there’s a little more involved than just replicating the volumes between two NAS servers and using a load balancer to distribute traffic between them.

Here is a great reference that really helped me get a good understanding on what’s involved in HA NFS: http://www.linux-ha.org/HaNFS (this site also has a lot of good HA info for Linux admins)

First Attempt:

First, I attempted to create an Active – Active, multi-master configuration between our existing NAS servers (two Linksys NSS6000′s that I will call NAS1 and NAS2).

Unfortunately, the NSS6000 is not configured to do synchronization between NAS servers and since you have very little access to the OS, it’s not really a good option to try to hack it to enable synchronization.

To get around this, I setup Unison on a third server (SRV1) that NFS mounts the volumes from NAS1 and NAS2 and handles the synchronization.  At this point, I realized that an Active – Active setup would not be possible using this configuration (as Unison must be scheduled using cron and can’t detect file/directory changes as they occur).

So I settled for an Active – Passive config and setup SRV1 to synchronize the volumes between NAS1 and NAS2, set NAS1 to be the primary server, set NAS2 as the secondary server, and used our load balancer to handle fail over.

Unison was handling the file synchronization just fine (though with considerable network traffic) and the multi-master relationship was solid, but we had a couple instances where NAS1 locked up and fail over to NAS2 was unsuccessful.

Dang!  Back to the drawing board….

Bottom Line:

If you check out the link concerning HA NFS, I’m sure you’ll find a bunch of reasons why this config was not successful.  HA NAS with Linux servers really should be using Heartbeat to facilitate the complexities that HA NAS requires (especially during fail over).

However, rather than beating my head against the wall trying to make a HA config work with servers that weren’t designed for HA, I decided to purchase new NAS servers that were designed for HA NAS.

I’ll be demoing the new NAS servers next week to make sure they can achieve an Active – Passive configuration with fail over (as the vendor literature says it can).

Cross your fingers!  ;)

I’m hoping this is the answer to this problem.

NFS for HA Lighttpd

Currently, we have lighttpd deployed in a high availability (HA) configuration.  In order to provide our users with a consistent experience, regardless of which web server they are connected to, we use NAS and NFS to provide identical content to each web server.

The diagram below should make our setup a little clearer:

Configuration Diagram

Configuration Diagram

Last week, we encountered some problems where requests for our website would hang.  Lighttpd had nothing to say about this problem and restarting it did nothing.  After about 10-20 minutes, the problem would resolve itself and everything would go back to business as usual.

Personally, this type of problem really frustrates me, so I made it a mission to figure out what was going wrong and fix it. (not that I had much of a choice, this is a production setup) ;)
 
I had some initial hunches to check first:

  1. some sort of short term DOS attack
  2. a traffic spike
  3. lighttpd and php (running in fcgi mode) running out of available file descriptors

Our monitoring software showed no signs of abnormal traffic behavior during the times when the problem was occurring.  So that ruled out the first two possible causes.

The last possible cause is more difficult to track down.  Lighttpd’s own documentation says that the error messages regarding a shortage of file descriptors may not be written to the error log and may only show up in test cases.  Well doesn’t that just take the cake?!   I checked anyway, but found nothing of interest in lighttpd’s error logs.

Next step?  Calculate the number of file descriptors currently being used by lighttpd and php under normal load and compare them to the maximums defined by lighttpd’s configuration file and the Operating System.

It turns out that lighttpd runs pretty light (pun intended) in terms of file descriptors.   PHP, on the other hand, uses (at least) 7 file descriptors per child!   Since we’re running 4 php processes that each have 128 children, I decided to increase the OS file descriptor limit from 1024 per process to 32768 per process.   Also, just to be safe, I increased lighttpd’s server.max-fds configuration option to 16384.

Unfortunately, this didn’t solve the problem.  We had another incident the day after I made the above changes.

Not to be deterred, I went digging in the syslog (/var/log/messages on Redhat variants) and found errors regarding lockd and our NFS mounts that corresponded to the times of the incidents.   Aha!  Now we’re on to something.  I added the nolock option to the mounting options in fstab and remounted the directories.

This seems to have solved the problem.  It’s been a couple days (with constantly increasing traffic) and we’ve had no more incidents.  Bottom line?  When using NFS in a HA configuration you should consider mounting the shares with the nolock option.

The next step is to setup HA NAS.  I’ll post again when we’ve accomplished this task.

Sendmail is one of those applications that can do just about anything (as long as you know the secret handshake that gets you access to detailed information about it’s configuration).

I’ve been using it for about 1 year to allow my company’s website to send email to our users.  To avoid all the headaches involved in making sure our emails don’t get rejected as spam, we use googlemail as a smart host since they host our email accounts for our domain.

This has been working great, but recently we ran into a limitation that would be a deal breaker if it couldn’t be solved.

The Problem:

  1. When using googlemail as your smart host, you must use authentication and Google rewrites the from address of all your emails to the user’s email address that you use to authenticate.  Thus, if you authenticate with webmaster@example.com’s credentials, all email from your server will seem to come from webmaster@example.com.  Even if you specify a different sender!
  2. When sendmail is configured to use a smart host that requires authentication (like googlemail), it chooses the authentication credentials based on the host name or IP address of the smart host.  So, unless you use multiple smart hosts, sendmail will only use 1 set of authentication credentials when connecting to the smart host.
  3. We are now hosting 2 websites on the same server with completely different domain names that each need to send email from different addresses that belong to their respective domains.

When you combine all three parts, the problem becomes clear:

How do I send email from info@yyy.com when sendmail will always use info@zzz.com’s credentials for smart host authentication, thus making Google rewrite the from address so that all email from my server appears to come from info@zzz.com?

The Solution:

To me, the best solution would be for sendmail to choose smart host authentication credentials based on the sender address of each email.

For example:

  1. Sendmail receives an email message to deliver from info@yyy.com
  2. Instead of looking up “smtp.googlemail.com” in authinfo, it looks up “info@yyy.com”

Seems simple enough, but, after hours of searching the web, I found that sendmail is just plain not configured to do this.

However, everything it needs to be able to do this is already available, it just doesn’t use it.  So I decided to make sendmail use it by editing the sendmail.cf file.

I know, I know, I’m supposed to use sendmail.mc, but darn it all if I wasn’t able to get this to work using the macro file.

So, without further ado, here’s what I did to make sendmail behave the way I wanted it to.

The How To:

Assumptions:

I’m assuming that you already have sendmail configured to use a smart host that requires authentication.  It is beyond the scope of this document to cover that.

Instructions:

1) Edit the authinfo section of sendmail.cf so that it looks like this:

######################################
### authinfo: lookup authinfo in the access map
###
###     Parameters:
###             $0: {f}
###             $1: {server_name}
###             $2: {server_addr}
######################################
Sauthinfo
R$*        $: <$(authinfo AuthInfo:$&{f} $: ? $)>
R<?>      $: <$(authinfo AuthInfo:$&{server_name} $: ? $)>
R<?>      $: <$(authinfo AuthInfo:$&{server_addr} $: ? $)>
R<?>      $: <$(authinfo AuthInfo: $: ? $)>
R<?>      $@ no                      no authinfo available
R<$*>    $# $1

The key here is the $&{f} macro.  In sendmail, it stands for the sender’s address from the envelope of the email.

The above code tells sendmail to look for entries in the authinfo file that match the sender’s email address.  If no match is found, then check for entries that match the smart host’s name/IP.

In effect, this preserves sendmail’s original behavior as the fall back if no match is found in the authinfo file for the sender’s address.

2) Add sender credentials to the authinfo file (its location is defined by the authinfo feature in sendmail.mc):

AuthInfo:info@ex.com “U:info@ex.com” “P:xx” “M:PLAIN”
AuthInfo:info@yyy.com “U:info@yyy.com” “P:xx” “M:PLAIN”
AuthInfo:info@zzz.com “U:info@zzz.com” “P:xx” “M:PLAIN”

AuthInfo:smtp.sh.com “U:info@ex.com” “P:xx” “M:PLAIN”
AuthInfo: “U:info@ex.com” “P:xx” “M:PLAIN”

In the example above, the first 3 entries are credentials for different senders that use the server to send email (they take advantage of the changes we made in step 1).  The last 2 entries are the original smart host authentication configuration which ensures that sendmail will still be able to connect to the smart host and send the email using a default set of credentials, even if the sender’s address does not match any of the other entries.

3) Remake the authinfo database using the “makemap” command.

4) Make sure to comment out any lines in your init script for sendmail that recompile sendmail.cf using the sendmail.mc script.  The location of this file on Redhat variants is: /etc/init.d/sendmail

(if you don’t do this, your changes to sendmail.cf will be lost when you restart sendmail!)

5) Restart sendmail.

You should now be able to control which user’s credentials are used to authenticate with the smart host on a case by case basis by changing the sender’s address on the email.

This is specified using the -f flag when calling sendmail on the command line (like PHP’s mail() function does).

Final Thoughts:

While this gets the job done, it would be nice to be able to incorporate the changes in step 1 into the sendmail.mc file.  This would make it much easier to maintain since anyone who runs the “make” command on the /etc/mail directory will destroy the changes we made to sendmail.cf in step 1.

Follow

Get every new post delivered to your Inbox.