Abstract |
With the continual increase in the high performance computing (HPC) market share,
the need for a cheaper and widely available system rather than the expensive typical
HPC systems increases. A promising alternative to HPC typical systems is the cloud
computing environment which is characterised by being cheap, flexible, scalable and
available. However, the cloud is based on virtualization which increases the latency
to access the processing and network resources due to resource sharing. Also, in
contrary to the traditional HPC systems that run on homogeneous, high-cost servers
with fast networking providing for predictable performance; the cloud’s underlying
hardware is heterogeneous, with slower network connection. This makes the cloud
an unpredictable environment to long run time programs such as HPC applications
and reduces the performance of communication intensive parallel applications on the
cloud. Hence, modelling and understanding performance is essential for exploiting
such environment.
In this thesis we introduce an analytical performance model of the execution of such
long run time and communication intensive applications on the cloud. The model
accounts for both the communication intensive parallel workloads as well as the
cloud’s processing and network resources. The model does that through considering
the cloud resources as a queueing network, and the parallel applications as jobs con-
testing for the shared resources. Based on the proposed model, we also introduce a
predictor for the execution time of the message passing interface (MPI) based appli-
cations on the cloud, as they are a major class of HPC applications. The prediction
process considers different configurations of workloads and processing resources.
The prediction based on the proposed model is measured on both a cluster of bare-
metal servers and on a group of virtual machines. The overall accuracy of this pre-
diction is 88% for 10 benchmarks, 5 benchmarks from SPEC-MPI and 5 benchmarks
from The NASA Advanced Supercomputing (NAS) parallel benchmark suite (NPB).
Moreover, a thorough analysis is conducted to the experiments’ results. |