Binary translation is the process of taking a program compiled for a given CPU architecture and translate it to run on another platform without compromising its functionality. This paper describes a technique for improving runtime performance of statically translated programs.
First, the program to be translated is analyzed to detect function boundaries. Then, each function is cloned, isolated and disentangled from the rest of the executable code. This process is called function isolation, and it divides the code in two separate portions: the isolated realm and the non-isolated realm.
Isolated functions have a simpler control-ﬂow, allowing much more aggressive compiler optimizations to increase performance, but possibly compromising functional correctness. To prevent this risk, this work proposes a mechanism based on stack unwinding to allow seamless transition between the two realms while preserving the semantics, whenever an isolated function unexpectedly jumps to an unforeseen target. In this way, the program runs in the isolated realm with improved performance for most of the time, falling back to the non-isolated realm only when necessary to preserve semantics.
The here proposed stack unwinding mechanism is portable across multiple CPU architectures. The binary translation and the function isolation passes are based on state-of-the-art industry proven open source components – QEMU and LLVM – making them very stable and ﬂexible. The presented technique is very robust, working independently from the quality of the functions boundaries detection. We measure the performance improvements on the SPECint 2006 benchmarks , showing an average of42%improvement,while still passing the functional correctness tests.