Switch to desktop Register Login

Scaling of codes on CHPC clusters

The performance of the following codes, namely, NAMD, WRF, DL_POLY_2 and 3 were tested on both Sun and GPU cluster. The scalability of these codes has been calculated using the following formula:


Basically; the formula is quantified as follows: the speed-up on P processors, S(P), is the ratio of the execution time on 1 processor, T(1) , to the execution time on P processors, T(P). Some of the benchmark results were calculated using nodes/gpus instead of processors; in that case; one need to replace processors with nodes/gpus in the above formula to achieve the results. Below is the scalability of NAMD tested on processors running on CHPC - GPU cluster:

top of the page

Figure 1: Scalability of NAMD on GPU cluster (Processors)

The above Figure 1 depicts the scalability of NAMD when running on Infiniband and Ethernet network of the GPU cluster. It further shows that the model performs much better when simulating on 80 processors on both Infiniband and Ethernet network. Based on these results, users are advised to utilise at least 32 processors which is much more reasonable and may accommodate other users running in the system. The below graph represent the scaling results of NAMD simulated on gpus (NVIDIA cards):


top of the page

Figure 2: Scalability of NAMD on GPU cluster (GPUs)

Figure 2 illustrates the scaling of NAMD when simulating in different number of gpus (NVIDIA cards) running in the GPU cluster. In particular, the scalability results shows that the model does not scale as expected from ~1 gpu up to 4 gpus on both Infiniband and Ethernet network. Thereafter, the performance starts to increase from ~8 gpus up until 20 gpus in all the selected networks. For this task, it is then recommended that this kind of model be executed in many gpus depending on availability of the system. Another molecular dynamics model; namely; DLPOLY 2.18 was also tested


top of the page

Figure 3: Scalability of DL_POLY 2.18 on Sun cluster

Figure 3 shows the scaling results of DL_POLY 2.18 when simulating on two different architectures, namely; Nehalem and Harpertown of Sun cluster. In summary, the model performed well in Nehalem system while in Harpertown is also scaling much reasonable by continuing to increase the performance when one increases number of nodes. To allow proper sharing of resources, it is then recommended that DL_POLY 2 users run on at least 4 compute nodes of Nehalem or use Harpertown if the system is busy. Another version of this molecular code (DL_POLY 3.09) is presented in the below graph:


top of the page

Figure 4: Scalability of DL_POLY 3.09 on Sun Microsystems cluster

Figure 4 outlines the scalability of DL_POLY 3.09 when executing in the following architectures: Nehalem and Harpertown of Sun system. In particular, the scalability results shows that the performance of the model was comparable from ~1 to 2 nodes of Nehalem infrastructure and slightly increases when one increases number of nodes. On the other hand, Harpertown system follows the trends of Nehalem and start to react properly when increase number of nodes. Depending on available system (either Nehalem or Harpertown), users of these model may run on at least 8 nodes when using a simulation of > 60,000 atoms. The below graph represent the performance of WRF simulated in the Sun system:


Figure 5: Scalability of WRF on Sun: Nehalem and Harpertown cluster

Above Figure 5 describes the scaling of WRF tested on Sun Microsystems cluster (Nehalem and Harpertown) systems; the scaling results outlines that the speed-up of WRF was almost as expected from ~1 to 2 nodes and thereafter it started to decrease rapidly from ~4 to 16 nodes; however; Harpertown accumulated much better performance than Nehalem system. Based on the scaling results of this study; it will then be appropriate for WRF users to use at least 16 nodes when using configuration of (1 month or more period). The below scaling results indicate the performance of WRF on Sun's Dell cluster.



Figure 6: The performance of WRF on Sun Dell system

Figure 6 shows the scaling of the weather model (WRF)  tested on different computational resources, starting from 1 to 16 compute nodes connected to each other via Infiniband and Ethernet network respectively. The performance of this weather simulation is comparable from 1 to 2 nodes and thereafter slightly decrease from 2 to 4 nodes when using both Infiniband and Ethernet network interface. Suddenly, the model start to increase  performance from 8 up to 16 nodes. The scalability of WRF on Sun: Dell system is optimum as compared to the tests performed on Sun: Nehalem and Harpertown systems discussed on Figure 5. To this end, It is therefore recommended that WRF users utilise 16 nodes to run simulation on Infiniband network of the cluster.

For more information about the configuration of all the codes, please click here

top of the page