Congratulations to Eric Anger on successfully defending his PhD proposal “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications”

Posted by on Aug 31, 2016 in News

The objective of the proposed research is to create a methodology for the modeling and characterization of extreme-scale applications operating within power limitations in order to guide optimization. It is likely that forthcoming high-performance machines will operate with stringent power caps, tying the performance of the systems to their energy-efficiency. Optimizing extreme-scale applications to operate within power limitations will require new techniques for understanding the relationships between application characterization, performance, and energy. The main contributions of this work are: 1) a methodology for the time and energy modeling of high-performance computing applications that can scale to a large number of nodes, 2) characterization of the different ways time and energy are affected by degree of parallelism and processor clock frequency, and 3) optimization of performance under a power cap when scheduling applications, both bulk-synchronous and data-parallel task-based application models.

Congratulations to Minhaj Hassan on successfully defending his PhD thesis “Exploiting On-Chip Memory Concurrency in 3D Manycore Architectures”

Posted by on Aug 31, 2016 in News

Many congratulations Dr. Hassan!

The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases in memory-level concurrency. This in turn affects the design of the multi-core interconnect and organization of the memory hierarchy. The work addresses the need for re-optimization in the presence of this increase in concurrency of the memory system.

First, we observe that 2D network latency and inefficient parallelism management in the current 3D designs are the main bottlenecks to fully exploit the potentials of 3D. To that end, we propose an extremely low-latency, low-power, high-radix router and present its various versions for different network typologies and configurations. We also explore optimizations and techniques to reduce the traffic in the network. Second, we propose a reorganization of the memory hierarchy and use simple address space translations to regulate locality, bandwidth and energy trade-offs in highly concurrent 3D memory systems. Third, we analyze the rise in temperature of 3D memories and propose variable-rate per-bank refresh management that exploits variability in temperature to reduce 3D DRAM’s refresh power and extend its operating range to higher temperatures.

Congratulations to Si Li, William Song and Minhaj Hassan for their papers accepted by IEEE International Reliability Physics Symposium!

Posted by on Apr 18, 2016 in News

Paper “Software-based Dynamic Reliability Management for GPU Applications”, co-authored by Si Li, Vilas Sridharan, Sudhanva Gurumurthi and Sudhakar Yalamanchili, has been accepted by IEEE International Reliability Physics Symposium. Congratulations to Si!

Paper “Reliability-Performance Tradeoff between 2.5D and 3D-Stacked DRAM Processors”, co-authored by William J. Song, Syed Minhaj Hassan, Saibal Mukhopadhyay and Sudhakar Yalamanchili, has been accepted by IEEE International Reliability Physics Symposium as a short paper/poster. Congratulations to William and Minhaj!

The conference was held in April 2016.

Prof. Yalamanchili gave a keynote “Implications of Memory-Centric Computing Architectures for Future NoCs” at the 9th International Symposium on Networks-on-Chip (NOCS).

Posted by on Jan 10, 2016 in News

The keynote “Implications of Memory-Centric Computing Architectures for Future NoCs” was presented by Prof. Yalamanchili at the 9th International Symposium on Networks-on-Chip (NOCS) in Vancouver, Canada on September 29.

Paper “General-Purpose Join Algorithms for Large Graph Triangle Listing on Heterogeneous Systems” accepted by GPGPU-9.

Posted by on Jan 10, 2016 in News

The paper “General-Purpose Join Algorithms for Large Graph Triangle Listing on Heterogeneous Systems”, co-authored by Daniel Zinn, Haicheng Wu, Jin Wang, Molham Aref and Sudhakar Yalamanchili, was accepted by GPGPU-9. This was a collaborative work with LogicBlox. Congratulations to the authors!

Congratulations to William Song for his paper “Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors” accepted by HPCA 2016!

Posted by on Jan 10, 2016 in News

William’s paper “Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors”, co-authored with Saibal Mukhopadhyay and Sudhakar Yalamanchili, was accepted by HPCA 2016. Congratulations!

Congratulations to Haicheng Wu on successfully defending his thesis “Acceleration and Execution of Relational Queries using General Purpose Graphics Processing Unit (GPGPU)”!

Posted by on Nov 24, 2015 in News

Haicheng Wu successfully defended his thesis on Nov 5, 2015. Congratulations Dr. Wu!

The abstract of Haicheng’s thesis goes as follows:

This thesis first maps the relational computation onto Graphics Processing Units (GPU)s by designing a series of tools and then explores the different opportunities of reducing the limitation brought by the memory hierarchy across the CPU and GPU system. First, a complete end-to-end compiler and runtime infrastructure, Red Fox, is proposed. The evaluation on the full set of industry standard TPC-H queries on a single node GPU shows on average Red Fox is 11.20x faster compared with a commercial database system on a state of art CPU machine. Second, a new compiler technique called kernel fusion is designed to fuse the code bodies of several relational operators to reduce data movement. Third, a multi-predicate join algorithm is designed for GPUs which can provide much better performance and be used with more flexibility compared with kernel fusion. Fourth, the GPU optimized multi-predicate join is integrated into a multi-threaded CPU database runtime system that supports out-of-core
data set to solve real world problem. This thesis presents key insights, lessons learned, measurements from the implementations, and opportunities for further improvements.

Congratulations to William Song on successfully defending his thesis “Managing Lifetime Reliability, Performance, and Power Tradeoffs in Multicore Microarchitectures”!

Posted by on Nov 24, 2015 in News

William Song successfully defended his thesis “Managing Lifetime Reliability, Performance, and Power Tradeoffs in Multicore Microarchitectures” on Oct 29, 2015. Congratulations Dr. Song!

The objective of this research is to characterize and manage lifetime reliabil- ity, microarchitectural performance, and power tradeoffs in multicore processors. This dissertation is comprised of three research themes; 1) modeling and simulation method of interacting multicore processor physics, 2) characterization and management of perfor- mance and lifetime reliability tradeoff, and 3) extending Amdahl’s Law for understanding lifetime reliability, performance, and energy efficiency of heterogeneous processors. With continued technology scaling, processor operations are increasingly dominated by multiple distinct physical phenomena and their coupled interactions. Understanding these behaviors requires the modeling of complex physical interactions. This dissertation first presents a novel simulation framework that orchestrates interactions between multiple physical mod- els and microarchitecture simulators to enable research explorations at the intersection of application, microarchitecture, energy, power, thermal, and reliability. Using this frame- work, workload-induced variation of device degradation is characterized, and its impacts on processor lifetime and performance are analyzed. This research introduces a new met- ric to quantify performance-reliability tradeoff. Lastly, the theoretical models of hetero- geneous multicore processors are proposed for understanding performance, energy effi- ciency, and lifetime reliability consequences. It is shown that these system metrics are gov- erned by Amdahl’s Law and correlated as a function of processor composition, scheduling method, and Amdahl’s scaling factor. This dissertation highlights the importance of multi- dimensional analysis and extends the scope of microarchitectural studies by incorporating the physical aspects of processor operations and designs.

Congratulations to Naila Farooqui on successfully defending her thesis “Runtime Specialization for Heterogeneous CPU-GPU Platforms”!

Posted by on Oct 23, 2015 in News

Naila Farooqui successfully defended her thesis on Oct 19, 2015. Congratulations Dr. Farooqui!

The abstract of Naila’s thesis goes as follows:

Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drive the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings.

The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research will: (a) enable dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverage such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automate some of the programmer effort required to exercise specialized architectural features of such platforms via
instrumentation-driven dynamic code optimizations; and (d) propose a specialized, affinity-aware work-stealing scheduler for integrated CPU-GPU processors that efficiently distributes work at runtime across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices.

Congratulations to Indrani Paul on successfully defending her thesis “Cooperative Power Management in Heterogeneous Processors”!

Posted by on Apr 8, 2015 in News

Indrani Paul successfully defended her thesis on Mar 23, 2015. Congratulations Dr. Paul!

The high-level contributions of Indrani’s thesis “Coordinated Power Management in Heterogeneous Processors” are i) in-depth examination of characteristics and performance demands of emerging applications using hardware measurements and analysis from state-of-the-art heterogeneous processors and high-performance GPUs, ii) analysis of the effects of processor physics such as power and thermals on system level performance, iii) identification of a key set of run-time metrics that can be used to manage these effects, and iv) development and detailed evaluation of online coordinated power management techniques to optimize system level global metrics in heterogeneous CPU-GPU-memory processors.