Runtime compilation has opportunities to parallelize code which are generally not
available using static parallelization approaches. However, the parallelized code
can possibly slowdown the performance due to unforeseen parallel overheads such
as synchronization and speculation support pertaining to the chosen parallelization
strategy and the underlying parallel platform. Moreover, with the wide usage of
heterogeneous architectures, such choice options become more pronounced. In
this paper, we consider an adaptive form of the parallelization operation, for the
first time. We propose a method for performing on-stack de-parallelization for a
parallelized binary loop at runtime, thereby allowing for rapid loop replacement
with a more optimized one. For this paper, we consider a loop parallelization
strategy and propose a corresponding de-parallelization method. The method relies on stopping the execution at safe points, gathering threads’ states, producing a
corresponding serial code, and continuing execution serially. The decision to deparallelize or not is taken based on the anticipated speedup. To assess the extent
of our approach, we have conducted an initial study on a small set of programs
with various parallelization overheads. Results show up to 4× performance improvement for a synchronization intense program on a 4-core Intel processor. |