Planning Hardware Upgrades

TrustNet is coming along very nicely, but I’m beginning to notice that I need some slight hardware improvements to have room for growth.

First of all, the current server is perfectly adequate and is expected to scale well. Here’s what TrustNet is currently running on:

  • Pentium 4 2.4GHz
  • 1GB ECC RAM, in dual channel
  • 2 x 200GB SATA drives
  • 1 x 250GB SATA drive
  • All 3 drives are in a RAID-1.

The actual problem is the case I picked for it. It’s silent indeed, the problem is that the design is weird and non-standard. The worst problem is that it uses a very weird power supply, with fans that are dying and that I can’t easily replace. The supply itself seems to be impossible to find here as well, and the way it’s assembled makes it pretty much impossible to replace without removing the motherboard. And if that wasn’t enough, the case design results in lots of dust getting sucked into it.

So the case is going to be the first thing to go, and replaced with this rack case. Along with this, I’m also in the process of getting a rack where to house it. Having to replace fans on the server while running, and oiling a fan on a running power supply (eek!) sure made me start to really appreciate the advantages of rack mounted hardware.

The other change that I’m going to do is to upgrade the RAM to 3GB total. The current amount is sufficient, but a bit tight. The database is getting larger by the day, and more RAM is always a good thing. The server runs Linux with the overcommit_memory parameter set to 2, and that increases memory requirements.

The overcommit_memory kernel parameter can be changed by writing the new value to /proc/sys/vm/overcommit_memory. It determines how the kernel allocates memory. The default setting in most systems is 0. Here are the possible values:

  • 0: Allow program to allocate more memory than exists, within some limit.
  • 1: Make malloc() always succeed.
  • 2: Allow allocating memory up to swap_space + fraction of physical memory defined in overcommit_ratio.

Why do values 0 and 1 exist? Because many programs allocate a large chunk of memory and then let it be unused. This happens because of fixed allocations larger than necessary “just in case”, inefficient allocations that are performed without the code that uses the memory block ever getting used, memory leaks, etc. This way of doing things allows to run more programs on the system.

The problem is of course, what happens if an application asks for 768MB RAM on a 512MB system (which will succeed under values 0 and 1), then actually goes and uses it? What happens is quite bizarre: The infamous OOM Killer rises from the nether depths of the Linux Kernel, evaluates all running processes with a magic formula, and decides which one of them to kill to free some memory. Note that the process being killed doesn’t have to be the one that made the system run out of memory! This means that most of the time, a completely innocent process will get sacrificed to allow the memory hog to keep running. To add to the weirdness of the situation, the process gets a SIGKILL. That means that the program being terminated can’t do anything about it, and has no chance to do a graceful shutdown.

Since having random processes die suddenly is a very bad thing on a server, mine runs with overcommit_memory=2. Under this setting, the kernel will never allow allocating more memory than is available, and if it’s attempted, malloc() will fail. Important things, like database servers are written to deal with memory allocation failure in a sane way, so this is a much better alternative. That, however, means that less memory than usual is available, and so I need to add more RAM to it to have a comfortable margin.

I will attempt to perform the upgrade during a grid downtime, which will be the next Wednesday if I have everything required by then.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: