Simulation-Based Code Duplication in a Dynamic Compiler
Dynamic compilers perform a wealth of optimizations to improve the performance of the generated machine code. They inline functions, unroll, peel and vectorize loops, remove allocations, perform instruction selection and scheduling, register allocation and code duplication. All optimizations serve the goal to improve the performance of the generated code along as many success metrics as possible, including latency, throughput, memory usage, cache behavior, micro-code usage, security and many others. In this process of transforming a source program to optimized machine code a typical compiler performs a multitude of decisions when applying optimizations. Many optimizations not only have positive impacts on a compilation unit, but can have negative effects as well, on any of the success metrics. Since it is infeasible for a compiler to produce the optimal code for a given source program on a given system, compilers resort to modeling optimization decisions via heuristic functions that are typically hand-tuned to a given set of benchmarks, in order to produce the fastest possible artifact. Duplicating code into different control-flow branches opens the potential for applying context- specific optimizations, which would not be possible otherwise. However, duplication-based optimizations, including tail-duplication and loop unrolling, can have negative impacts on the performance of the generated machine code of a program. However, in many cases they are still able to improve performance significantly. This imposes a challenge on modern compilers: Duplicating instructions at every control flow merge is not possible because it leads to uncontrolled code growth and compile time increases. Yet, not duplicating code and missing out on performance increases is also not a suitable option. Therefore, compilers need to determine the performance and code-size impacts of a duplication transformation, before performing it. Answering the question of the impact of a single duplication transformation on the optimization potential of an entire compilation unit typically requires compile-time-intensive analysis unsuitable for dynamic compilation. Consequently, dynamic compilers commonly resort to simple heuristics modeling beneficial and harmful impacts of a duplication optimization. However, heuristics are never complete and often miss modeling aspects crucial for the performance of a program. To tackle the shortcomings of duplication-based optimizations in a dynamic compiler we propose simulation-based code duplication, a three-tier optimization scheme that allows a compiler to find duplication optimization candidates (1), trade-off their expected impacts between different candidates (2) and only perform those duplication transformations that are considered beneficial (3). Simulation-based code duplication is precise, meaning that all simulated performance improvements are applicable after duplication. Additionally, it is complete, meaning it allows to simulate the effect of any given duplication-dependent optimization on a compilation unit after duplication. We implemented our simulation-based code duplication approach on top of the Graal Virtual Machine and applied it to two code-duplication based optimizations: tail-duplication and loop unrolling for non-counted loops. We show that our simulation-based code duplication scheme outperforms hard-coded heuristics and can significantly improve performance of the generated machine code. Large parts of our work have been integrated into Oracle Labs Graal Virtual Machine and are commercially available.
PhD thesis, Johannes Kepler University Linz, August 2019Download als pdf