|
Anindya Roy
Sometimes, we go that much further and mix and match existing technologies. The result is almost always a path breaking idea, also known as 'innovation'. This story is exactly one such innovation. We promised you last month that we would be discussing different types of HPC Cluster/Super Computer technologies and how to implement them every month. This time, we take that forward and do something new with the cluster we built last month. We create a NAS-box using a set of standard PCs from our Cluster setup. We also reserve another set of PCs for running the stress tests on the first set. This is to evaluate if it's possible to make such a NAS box and also check the performance of our cluster. The results we got were impressive.
The idea we had was to aggregate the free disk space of 9 standard P4 machines running as a cluster. The result of this aggregation was the computing power of all the 9 P4 processors, but storage of only 6 of them because three of the PCs were diskless nodes running a Linux distro.
The setup
The software and technologies that we have used for this setup were OpenMosix clustering software for the HPC layer. This time however, this OM (OpenMosix) cluster was modified from last time. We configured an MFS (Mosix File system) on top of the cluster. See below for more on what MFS is. And then we used Open Andrew File System (OpenAFS) file system to aggregate the disk space of the 6 machines. OpenAFS provides a Distributed File system layer on top of the cluster.
You have to install PCQLinux 2004 or 2005 and configure OM on it. For more details on configuring and installing OpenMosix, read our last month's story or visit http://openmosix.sf.net.
MFS
If you think that MFS has something to do with NFS, then you are wrong. MFS is basically a 'cluster-wide' file system. For the last few months, we have been talking about OpenMosix a lot. If you have used the Cluster Knoppix live distro, then you must have noticed that when you run a data crunching process, sometimes it is processed on a completely different system on the cluster.
Have you ever wondered how OM is able to read data from one machine and do the crunching on some other machine? This is actually achieved because of MFS. When you install OpenMosix, a kernel patch is also installed on the machine. This kernel patch also provides an MFS extension to the Linux file system.
When you run OM, a new file system is created, which is called '/mfs'. This folder has the individual file systems of all the cluster nodes mounted as '/mfs/NODE_NUMBER'. So let's say you go and copy a file into the /tmp directory for node 5. You will now be able to access that file from any of the nodes from the /mfs/5/tmp directory.
To have MFS work properly, you have to have DFSA also enabled on your cluster's nodes.
DFSA
This stands for Direct File System Access, and it has to be enabled on all the cluster machines to make the whole thing work. When we installed OpenMosix on top of PCQLinux 2004, DFSA was enabled by default. You can check whether it is enabled, by running the following command:
# cat /proc/hpc/admin/version
DFSA allows the cluster node to run direct I/O operations on any share mounted using MFS. Without DFSA, MFS will be nothing more than any other network file system.
Installing OpenAFS
To install OpenAFS, you can either recompile your kernel with OpenAFS support, or download and install the OpenAFS rpm from ftp://rpmfind.net/linux/ dag/redhat/9/e/i386/dag/RPMS/openafs-1.2.10-0.dag.rh90.i386.rpm. Run the following command to install OpenAFS:
# rpm –ivh openafs-1.2.10-0.dag.rh90.i386.rpm
You need to keep in mind that OpenAFS must be installed after booting the system with OpenMosix. Otherwise, this functionality will be added into the default PCQLinux 2004 kernel and when you boot into the OM kernel you can't use OpenAFS.


Configuring the partitions
The OpenAFS file server must have at least one partition or logical volume dedicated to storing AFS volumes. Each server partition is mounted at a directory named /vicepxx, where 'xx' are two lowercase letters. These /vicepxx directories must reside in the file server's root directory.
Now, create a directory called /vicepxx for each AFS server partition you are configuring (there must be at least one). Repeat the command for each partition. To do so, run the following command:
# mkdir /vicepxx
Add a line with the following format to the /etc/fstab file, for each directory you just created. The entire statement below must appear on a single line.
# /dev/disk /vicepxx ext2 defaults 0 2
Next you need to create a file system on each partition to be mounted at the /vicepxx directory. For this, run the following command
# mkfs -v /dev/disk
Mount all partitions by running the following command. This will of course take the parameters from the fstab file as modified above.
# mount -a
After you are done with this, you have to start the AFS server process. You can do that by running the bosserver command. BOS stands for Basic OverSeer (BOS) Server. When running this command, include the '-noauth' flag to disable authorization checking. This command will look something like this
# /usr/afs/bin/bosserver -noauth &
This step is needed because you have not yet configured your cell's AFS authentication and authorization mechanisms. The BOS Server cannot perform authorization checks as it does during normal operation. In this no-authorization mode, it does not verify the identity or privilege of the issuer of a bos command, and so performs any operation for anyone. As it initializes for the first time, the BOS Server creates the following directories and files:
/usr/afs/db
/usr/afs/etc/CellServDB
/usr/afs/etc/ThisCell
/usr/afs/local
/usr/afs/logs
It then sets the owner to the local superuser (root) and the mode bits to limit the ability to write (and in some cases, read) them. This environment is OK for testing, but in a production scenario, we do not recommend you continue using it this way. You should setup proper PAM authentication for your server.
We did this just to test the performance of OpenAFS running over a cluster. If you really want to use OpenAFS in a production environment, please ensure security of the AFS file system and go through the documentation at http://www.openafs.org/ doc/index.htm carefully and follow the PAM authentication path properly according to your network needs. Because these deal with data-sharing over the network, which can be risky.
.jpg)

Tests and Results
After configuring both OM and OpenAFS, the NAS cluster is ready for use. To test this device we created a 11 node (10 node + 1 controller) machine for loading. We ran the same suite of tests as we do for benchmarking standard NAS boxes and servers in our regular reviews. This involved running NetBench with 1, 5, 10, 20, 30 and 40 engines.
We noted the throughput given by this cluster. As we had 10 nodes in our load and 6 disks in our cluster, we had an average of 2 clients (load nodes) on each hard-drive of the cluster. The results we got were pretty good. We got a maximum throughput of 452.5 Mbps. For our test, we had created a heterogeneous network for the cluster with a mix of Gigabit and 100 Mbps LAN cards. The loads had a dedicated network of 1 Gbps. So if you changed all these cards to 1 Gbps, the performance could increase even more. Our cluster was able to produce this throughput with 30 clients running simultaneously. When we upped the load to 40 clients, the throughput dropped slightly to 436 Mbps. If we compare these results against our recent server shoot out results, then this cluster will stand second among those servers.
The key point here is not just the performance and throughput of the cluster. The total CPU and RAM usage of the cluster during the test was merely 10%. This means that despite the heavy data transfer, the cluster can still work as a HPC and crunching huge amount of data.
To test the performance further, we ran some data crunching tasks simultaneously along with NetBench. We got it to convert 100 WAV files to OGG. This used 80% of the cluster's performance for around 16 mins, and we were simultaneously running NetBench. The NetBench result was almost indentical at-427 Mbps with 30 clients. This is pretty good.
Source: PCQuest
|