Search
Now showing items 11-18 of 18
APRIL: A Processor Architecture for Multiprocessing
(1991)
Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching processor called APRIL with ...
Automatic Partitioning of Parallel Loops for Cache-coherent Multiprocessors
(1992-12)
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. The framework introduces the notion of uniformly intersecting ...
Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling
(1993-02)
This paper develops a data reference modeling technique to estimate with high accuracy the cache miss ratio in cache-coherent multiprocessors. The technique involves analyzing the dynamic data referencing behavior of ...
Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed -Memory Multiprocess
(1995-01)
Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage multiple levels of memory ...
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
(1992-12)
This paper presents a hierarchical approach for compiling macro dataflow graphs for multiprocessors with local memory. Macro dataflow graphs comprise several nodes (or macros operations) that must be executed subject to ...
Low-cost Support for Fine-grain Synchronization in Multiprocessors
(1992-06)
As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of parallelism required to utilize ...
Performance Tradeoffs in Multithreaded Processors
(1991-04)
High network latencies in large-scale multiprocessors can cause a significant drop in processor utilization. By maintaining multiple process contexts in hardware and switching among them in a few cycles, multithreaded ...
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-memory Multiprocessors
(1995-09)
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane ...