Monday, January 14, 2013

Building a 32 cores, 16 nodes HOME HPC Cluster


Building a 32 cores, 16 nodesWindows Server HPC Cluster

After 20 years with Microsoft, last December I found myself in this unusual “in between jobs” situation. What a great time to start something new and use some spare time for some experimentation's. While working at Microsoft I had a chance to play with many different Windows HPC clusters; from small to large configurations like a 1,800 cores cluster. Since those days are now behind me I set myself with the goal to create a small home cluster for development and small scale benchmarks.

Like I said, because I have some free time I’m also taking this opportunity to share some of my experience building this cluster. Here are some initial considerations about the construction of this HOME HPC Cluster.

Cluster under construction


Final Result


16 Nodes Windows HPC Cluster up and running


The 32 cores, 16 nodes cluster configuration cost a little less than US$ 2,000 to build; that's $62.50 per core or $125.00 per node (including virtualized nodes). That's not bad considering that any similar configuration from a brand name PC maker would put the price at least double that. Of course I don't want fool myself or anybody: this cluster was build mostly with lower end components and can't be compared to a robust, server grade cluster. Just as an example, to keep costs low I opted for non ECC memory; not a wise decision if you plan to run in production. In spite of that the cluster is fully functional and is serving well for the intended purpose of allowing for the developing and running of some parallel code. I'll start to share some code, benchmark results, and conclusions in subsequent posts. For now, here's the cluster configuration:

Computer
v          - ATX Mid Tower Case - 400W PSU, 2x Int 5.25" x 1x Ext 3.5", 2x Int 3.5", 2x Front USB Ports
v          - Motherboard: ASUS M5A97 LE R2.0
v          - Processor: AMD FX-8320, AM3+, Eight-Core, 3.5GHz, 16MB, 125W, Unlocked
v          - Memory: 16GB Desktop - DDR3, 2 x 8GB, 240 Pin, DIMM, XMP Ready
v          - Video Card: Asus ATI Radeon HD5450 Silence - 1 GB DDR3 VGA/DVI/HDMI
v          - Hard drive - 500 GB - internal - 2.5" - SATA-300 - 7200 rpm - buffer: 16 MB

Network
v  Network adapter: Realtek PCIe GBE Family Controller (two per computer)
v  Switch: TRENDnet 8-Port Gigabit GREENnet Switch
v  CAT 6 network cables

Software
v  Windows Server 2012 Standard Evaluation, HPC Pack 2012

Other
v  IOGEAR 4 Port USB Cable KVM Switch
v  Old Gateway Netbook -Two Cores Atom CPU
v  Old Linksys WRT160N

I've also considered building this cluster using lower powered passive cooled computers but at the end the cost per core was more expensive  than the higher core density solution I built - based on the price for lower powered Intel Atom based machines as of December 2012. A bunch of other options to reduce cost that for different reasons didn’t prevail: headless installation (nodes without a video card), diskless nodes.

Additionally the cluster cost is not taking in consideration the almighty Active Directory / Internet Gateway server as you can see in the cluster photo. For this server I'm taking advantage of an old two cores Atom based netbook that was not in use for some time. Actually, it works pretty well for this small HPC cluster.

It’s been a long time – years - since my last custom built computer. In all honesty I had a lot of fun putting this cluster together from scratch: researching and buying the hardware parts, assembling the cluster (machines, network), software install and configuration (Windows Server, DNS, AD, DHCP, Hyper-V, HPC Pack). Considering that I'm not an infrastructure or system admin guy I think it all went pretty well and I'm really satisfied with the results.

What's next? Let's put my new toy to work! I'm completing a small C# program that'll enable to run file unzipping (www.7-zip.org) in parallel using Windows HPC. This is a typical embarrassing parallel type of problem and a good fit for some initial tests. I'll have three different C# implementations to demonstrated the use of distinct approaches to implement the Parallel Unzip :

v  Using the Windows HPC Scheduler API - let's put the Windows HPC scheduler under some stress.
v  Using Service Oriented Architecture HPC - I expect to see the best performance of the three approaches; we'll see...
v  Traditional MPI based approach - you gotta have the classic represented.

In subsequent posts I'll share the code and the benchmark results of those three implementations.

7 comments:

  1. Dear Mr. Manfredi,
    Your HOME HPC Cluster looks great.
    We plan to deploy similar cluster for training purpose.
    Please advise us what manual to use?
    We have found "DIY Supercomputing: How to Build a Small Windows HPC Cluster".
    I see 4 nodes in your cluster. Where are the rest?
    With best wishes,
    G. Oyunbayar
    G-Mobile LLC www.g-mobile.mn
    R#506, ChD,
    15160 Ulaanbaatar
    Mongolia
    Phone: +976 93111000
    Fax: +976 11311195

    ReplyDelete
  2. Hi Oyunbayar,

    The DIY documentation you mentioned is a very good place to start to build your cluster. In my configuration I have 4 physical computers with eight cores each; on each physical computer I create 4 virtual machines using Hyper-V. That's how I got 16 nodes.

    Let me know if you have any other question.

    Best regards,
    Pedro Manfredi

    ReplyDelete
  3. Thanks for sharing, Pedro. Can You explain what is the purpose of artificially increased nodes number ?

    ReplyDelete
    Replies
    1. Hi,

      The primary reason for using this configuration is to enable some testing scenarios like: deployment, performance bench-marking. In some cases this configuration might be desirable as well if your cluster will be used by multiple users and you want additional levels of isolation in: security, cpu use, etc. In any case, this is very flexible as you can use it in any desirable configuration: physical nodes, virtual nodes, or any combination.

      Hope this helps,
      Pedro

      Delete
  4. Hello Great Job ! Please can you tell,will it help in reducing time while video rendering. basically i want it for animation & video editing purpose . Please Help..

    ReplyDelete
    Replies
    1. Hi Santhal,

      You need to check your rendering software for support to Windows HPC. Renderman , believe, currently supports Windows HPC.

      Delete
  5. very nice work dear Pedro. I really have a lot of old pcs with dual core duo /DDR2 RAMS and up to 4 GB per PC. I dont know if these PCs can be put together to build a working cluster (may be using win srv 08), and if it is done, how how can i then test it and which tools do i need to build parallel small applications.

    your idea is small for a person, but huge for others.

    :)


    Regards

    derar

    ReplyDelete