Monday, January 14, 2013

Building a 32 cores, 16 nodes HOME HPC Cluster


Building a 32 cores, 16 nodesWindows Server HPC Cluster

After 20 years with Microsoft, last December I found myself in this unusual “in between jobs” situation. What a great time to start something new and use some spare time for some experimentation's. While working at Microsoft I had a chance to play with many different Windows HPC clusters; from small to large configurations like a 1,800 cores cluster. Since those days are now behind me I set myself with the goal to create a small home cluster for development and small scale benchmarks.

Like I said, because I have some free time I’m also taking this opportunity to share some of my experience building this cluster. Here are some initial considerations about the construction of this HOME HPC Cluster.

Cluster under construction


Final Result


16 Nodes Windows HPC Cluster up and running


The 32 cores, 16 nodes cluster configuration cost a little less than US$ 2,000 to build; that's $62.50 per core or $125.00 per node (including virtualized nodes). That's not bad considering that any similar configuration from a brand name PC maker would put the price at least double that. Of course I don't want fool myself or anybody: this cluster was build mostly with lower end components and can't be compared to a robust, server grade cluster. Just as an example, to keep costs low I opted for non ECC memory; not a wise decision if you plan to run in production. In spite of that the cluster is fully functional and is serving well for the intended purpose of allowing for the developing and running of some parallel code. I'll start to share some code, benchmark results, and conclusions in subsequent posts. For now, here's the cluster configuration:

Computer
v          - ATX Mid Tower Case - 400W PSU, 2x Int 5.25" x 1x Ext 3.5", 2x Int 3.5", 2x Front USB Ports
v          - Motherboard: ASUS M5A97 LE R2.0
v          - Processor: AMD FX-8320, AM3+, Eight-Core, 3.5GHz, 16MB, 125W, Unlocked
v          - Memory: 16GB Desktop - DDR3, 2 x 8GB, 240 Pin, DIMM, XMP Ready
v          - Video Card: Asus ATI Radeon HD5450 Silence - 1 GB DDR3 VGA/DVI/HDMI
v          - Hard drive - 500 GB - internal - 2.5" - SATA-300 - 7200 rpm - buffer: 16 MB

Network
v  Network adapter: Realtek PCIe GBE Family Controller (two per computer)
v  Switch: TRENDnet 8-Port Gigabit GREENnet Switch
v  CAT 6 network cables

Software
v  Windows Server 2012 Standard Evaluation, HPC Pack 2012

Other
v  IOGEAR 4 Port USB Cable KVM Switch
v  Old Gateway Netbook -Two Cores Atom CPU
v  Old Linksys WRT160N

I've also considered building this cluster using lower powered passive cooled computers but at the end the cost per core was more expensive  than the higher core density solution I built - based on the price for lower powered Intel Atom based machines as of December 2012. A bunch of other options to reduce cost that for different reasons didn’t prevail: headless installation (nodes without a video card), diskless nodes.

Additionally the cluster cost is not taking in consideration the almighty Active Directory / Internet Gateway server as you can see in the cluster photo. For this server I'm taking advantage of an old two cores Atom based netbook that was not in use for some time. Actually, it works pretty well for this small HPC cluster.

It’s been a long time – years - since my last custom built computer. In all honesty I had a lot of fun putting this cluster together from scratch: researching and buying the hardware parts, assembling the cluster (machines, network), software install and configuration (Windows Server, DNS, AD, DHCP, Hyper-V, HPC Pack). Considering that I'm not an infrastructure or system admin guy I think it all went pretty well and I'm really satisfied with the results.

What's next? Let's put my new toy to work! I'm completing a small C# program that'll enable to run file unzipping (www.7-zip.org) in parallel using Windows HPC. This is a typical embarrassing parallel type of problem and a good fit for some initial tests. I'll have three different C# implementations to demonstrated the use of distinct approaches to implement the Parallel Unzip :

v  Using the Windows HPC Scheduler API - let's put the Windows HPC scheduler under some stress.
v  Using Service Oriented Architecture HPC - I expect to see the best performance of the three approaches; we'll see...
v  Traditional MPI based approach - you gotta have the classic represented.

In subsequent posts I'll share the code and the benchmark results of those three implementations.