A Case for Runtime Code Generation

Author
David Keppel
Author
Susan J. Eggers
Author
Robert R. Henry,
Local Title
TR 91-11-04
Abstract
We define {\em runtime code generation} (RTCG) as dynamically adding code to the instruction stream of an executing program. It has been in use since the 1940s, but fell from favor because changing hardware and software technologies made it less profitable and its {\em ad-hoc} implementation made it nonportable. In this paper we argue for it's return. We base our arguments on a new set of changes in hardware technology, software technology, and workloads. Each of these changes brings about circumstances in which dynamically optimized code can execute fast enough to pay for the runtime overhead of creating it. We support our analysis with concrete examples. We compare static-code and RTCG implementations of several applications and show that RTCG can lead to performance improvements.

Improving the Performanceof Runtime Parallelization

Author
LEUNG
Author
ZAHORJAN
Local Title
TR 92-10-05
Ftp
cs.washington.edu/tr/1992/10/UW-CSE-92-10-05.PS.Z
Abstract
When the inter-iteration dependency pattern of the iterations of a loop can not be determined statically, compile time parallelization of the loop is not possible. In these cases, runtime parallelization [8] is the only alternative. The idea is to transform the loop into two code fragments: the inspector and the executor. When the program is run, the inspector examines the iteration dependencies and constructs a parallel schedule. The executor subsequently uses that schedule to carry out the actual computation in parallel. In this paper, we show how to reduce the overhead of running the inspector through its parallel execution. We describe two related approaches. The first, which emphasizes inspector efficiency, achieves nearly linear speedup relative to a sequential execution of the inspector, but produces a schedule that may be less efficient for the executor. The second technique, which emphasizes executor efficiency, does not in general achieve linear speedup of the inspector, but is guaranteed to produce the best achievable schedule. We present these techniques, show that they are correct, and compare their performance to existing techniques using a set of experiments. Because in this paper we are optimizing inspector time, by leaving the executor unchanged, the techniques we present have the most dramatic effect when the inspector must be run for each invocation of the source loop. In a companion paper, we explore techniques that build upon those developed here to also improve executor performance.

Reordering Iterations in Run-time Loop Parallelization

Author
ZAHORJAN
Author
LEUNG
Local Title
TR 92-12-07
Ftp
cs.washington.edu/tr/1992/12/UW-CSE-92-12-07.PS.Z
Abstract
When a loop in a sequential program is parallelized, it is normally guaranteed that all flow dependencies and anti-dependencies are respected so that the result of parallel execution is always the same as sequential execution. In some cases, however, the algorithm implemented by the loop allows the iterations to be executed in a different sequential order than the one specified in the program. This opportunity can be exploited to expose parallelism that exists in the algorithm but is obscured by its sequential program implementation. In this paper, we show how parallelization of this kind of loop can be integrated into the runtime parallelization scheme of Saltz et al. [17, 18]. Runtime parallelization is a general technique appropriate for loops whose dependency structures cannot be determined at compile time. The compiler generates two pieces of code: the inspector examines dependencies at run time and computes a parallel schedule; the executor executes itertions in parallel according to the computed schedule. In our case, the inspector has to solve two problems: choosing an appropriate sequential order for the iterations and computing a parallel schedule. The two problems are treated as a single graph

Adhara: Runtime Support for Dynamic Space-Based Applications on Distribute

Author
ZAHORJAN
Author
ASHOK
Local Title
TR 93-04-01
Ftp
cs.washington.edu/tr/1993/04/UW-CSE-93-04-01.PS.Z
Abstract
Dynamic space-based applications are simulations of objects moving through a closed k-dimensional space subject to mutual forces. There are a wide variety of such applications, differing in the kinds of objects and forces being simulated. Dynamic space-based applications exhibit strong data locality patterns, but these patterns change during the computation as objects change position in the simulated space. To achieve good performance when run on distributed memory multiprocessors, two conflicting goals much be addressed: the strong spatial data locality must be exploited, and the computational load must be balanced across the processors. These optimizations can be done statically, by either the programmer or the compiler, or dynamically, by handwritten application code or a runtime system. Relying on the programmer for these functions imposes an unreasonable burden on her, as it is a hard and time consuming process to develop the code required. At the same time, the compiler cannot be expected to automatically generate code leading to good performance, since it cannot extract the space-based data dependencies from the program source. Thus, the most appropriate approach to providing the needed functions is through a specialized runtime system that can be used with any dynamic, space-based application. In this paper we describe the design and the programming interface of ADHARA, a runtime system tailored to the needs of dynamic space-based applications. Using a specific plasma physics application as an example, we show that ADHARA enables the development of efficient parallel codes that involve very few additional lines of code and very few additional concepts beyond those required to construct a sequential version of the application. 2. Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay. Run-time parallelization and scheduling of loops. (technical) IEEE Transactions on Computers v40, n5 (May, 1991):603 (10 pages). Pub Type: Technical. Type D 2 AB to see abstract. AT: UCI Sci Lib TK 7885 A1 I2 Drum Latest in Curr Per Rm B2-42(1953-93).B43N1-5,11- 12(1994).B44(1995) (PE title: IEEE transactions on computers.) 1. Rauchwerger, Lawrence; Amato, Nancy M.; Padua, David A. A scalable method for run-time loop parallelization. International Journal of Parallel Programming v23, n6 (Dec, 1995):537 (40 pages). Type D 1 AB to see abstract. AT: UCI Sci Lib QA 76.5 I56 Drum Latest in Curr Per Rm B15-22(1986-94) (PE title: International journal of parallel programming.)