A Case for Runtime Code Generation
- Author
- David Keppel
- Author
- Susan J. Eggers
- Author
- Robert R. Henry,
- Local Title
- TR 91-11-04
- Abstract
- We define {\em runtime code generation} (RTCG) as dynamically
adding code to the instruction stream of an executing program. It has
been in use since the 1940s, but fell from favor because changing
hardware and software technologies made it less profitable and its {\em
ad-hoc} implementation made it nonportable. In this paper we argue for
it's return. We base our arguments on a new set of changes in hardware
technology, software technology, and workloads. Each of these changes
brings about circumstances in which dynamically optimized code can
execute fast enough to pay for the runtime overhead of creating it. We
support our analysis with concrete examples. We compare static-code
and RTCG implementations of several applications and show that RTCG can
lead to performance improvements.
Improving the Performanceof Runtime Parallelization
- Author
- LEUNG
- Author
- ZAHORJAN
- Local Title
- TR 92-10-05
- Ftp
- cs.washington.edu/tr/1992/10/UW-CSE-92-10-05.PS.Z
- Abstract
- When the inter-iteration dependency pattern of the iterations of a
loop can not be determined statically, compile time parallelization of
the loop is not possible. In these cases, runtime parallelization [8]
is the only alternative. The idea is to transform the loop into two
code fragments: the inspector and the executor. When the program is
run, the inspector examines the iteration dependencies and constructs
a parallel schedule. The executor subsequently uses that schedule to
carry out the actual computation in parallel.
In this paper, we show how to reduce the overhead of running the
inspector through its parallel execution. We describe two related
approaches. The first, which emphasizes inspector efficiency, achieves
nearly linear speedup relative to a sequential execution of the
inspector, but produces a schedule that may be less efficient for the
executor. The second technique, which emphasizes executor efficiency,
does not in general achieve linear speedup of the inspector, but is
guaranteed to produce the best achievable schedule. We present these
techniques, show that they are correct, and compare their performance
to existing techniques using a set of experiments.
Because in this paper we are optimizing inspector time, by leaving the
executor unchanged, the techniques we present have the most dramatic
effect when the inspector must be run for each invocation of the
source loop. In a companion paper, we explore techniques that build
upon those developed here to also improve executor performance.
Reordering Iterations in Run-time Loop Parallelization
- Author
- ZAHORJAN
- Author
- LEUNG
- Local Title
- TR 92-12-07
- Ftp
- cs.washington.edu/tr/1992/12/UW-CSE-92-12-07.PS.Z
- Abstract
- When a loop in a sequential program is parallelized, it is normally
guaranteed that all flow dependencies and anti-dependencies are
respected so that the result of parallel execution is always the same
as sequential execution. In some cases, however, the algorithm
implemented by the loop allows the iterations to be executed in a
different sequential order than the one specified in the program.
This opportunity can be exploited to expose parallelism that exists in
the algorithm but is obscured by its sequential program
implementation.
In this paper, we show how parallelization of this kind of loop can be
integrated into the runtime parallelization scheme of Saltz et al.
[17, 18]. Runtime parallelization is a general technique appropriate
for loops whose dependency structures cannot be determined at compile
time. The compiler generates two pieces of code: the inspector examines
dependencies at run time and computes a parallel schedule; the
executor executes itertions in parallel according to the computed
schedule.
In our case, the inspector has to solve two problems: choosing an
appropriate sequential order for the iterations and computing a
parallel schedule. The two problems are treated as a single graph
Adhara: Runtime Support for Dynamic Space-Based Applications on Distribute
- Author
- ZAHORJAN
- Author
- ASHOK
- Local Title
- TR 93-04-01
- Ftp
- cs.washington.edu/tr/1993/04/UW-CSE-93-04-01.PS.Z
- Abstract
- Dynamic space-based applications are simulations of objects moving
through a closed k-dimensional space subject to mutual forces. There
are a wide variety of such applications, differing in the kinds of
objects and forces being simulated.
Dynamic space-based applications exhibit strong data locality
patterns, but these patterns change during the computation as
objects change position in the simulated space. To achieve
good performance when run on distributed memory multiprocessors,
two conflicting goals much be addressed: the strong spatial data
locality must be exploited, and the computational load must be
balanced across the processors. These optimizations can be done
statically, by either the programmer or the compiler, or dynamically,
by handwritten application code or a runtime system. Relying on the
programmer for these functions imposes an unreasonable burden on her,
as it is a hard and time consuming process to develop the code required.
At the same time, the compiler cannot be expected to automatically
generate code leading to good performance, since it cannot extract
the space-based data dependencies from the program source. Thus,
the most appropriate approach to providing the needed functions is
through a specialized runtime system that can be used with any
dynamic, space-based application.
In this paper we describe the design and the programming interface of
ADHARA, a runtime system tailored to the needs of dynamic space-based
applications. Using a specific plasma physics application as an
example, we show that ADHARA enables the development of efficient
parallel codes that involve very few additional lines of code and very
few additional concepts beyond those required to construct a
sequential version of the application.
2. Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay.
Run-time parallelization and scheduling of loops. (technical)
IEEE Transactions on Computers v40, n5 (May, 1991):603 (10 pages).
Pub Type: Technical.
Type D 2 AB to see abstract.
AT: UCI Sci Lib TK 7885 A1 I2 Drum Latest in Curr Per Rm
B2-42(1953-93).B43N1-5,11-
12(1994).B44(1995)
(PE title: IEEE transactions on computers.)
1. Rauchwerger, Lawrence; Amato, Nancy M.; Padua, David A.
A scalable method for run-time loop parallelization.
International Journal of Parallel Programming v23, n6 (Dec, 1995):537 (40
pages).
Type D 1 AB to see abstract.
AT: UCI Sci Lib QA 76.5 I56 Drum Latest in Curr Per Rm
B15-22(1986-94)
(PE title: International journal of parallel programming.)