Prof. Yalamanchili gave a keynote “Implications of Memory-Centric Computing Architectures for Future NoCs” at the 9th International Symposium on Networks-on-Chip (NOCS).
The keynote “Implications of Memory-Centric Computing Architectures for Future NoCs” was presented by Prof. Yalamanchili at the 9th International Symposium on Networks-on-Chip (NOCS) in Vancouver, Canada on September 29.
Paper “General-Purpose Join Algorithms for Large Graph Triangle Listing on Heterogeneous Systems” accepted by GPGPU-9.
The paper “General-Purpose Join Algorithms for Large Graph Triangle Listing on Heterogeneous Systems”, co-authored by Daniel Zinn, Haicheng Wu, Jin Wang, Molham Aref and Sudhakar Yalamanchili, was accepted by GPGPU-9. This was a collaborative work with LogicBlox. Congratulations to the authors!
Congratulations to William Song for his paper “Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors” accepted by HPCA 2016!
William’s paper “Amdahl’s Law for Lifetime Reliability Scaling in Heterogeneous Multicore Processors”, co-authored with Saibal Mukhopadhyay and Sudhakar Yalamanchili, was accepted by HPCA 2016. Congratulations!
Congratulations to Haicheng Wu on successfully defending his thesis “Acceleration and Execution of Relational Queries using General Purpose Graphics Processing Unit (GPGPU)”!
Haicheng Wu successfully defended his thesis on Nov 5, 2015. Congratulations Dr. Wu!
The abstract of Haicheng’s thesis goes as follows:
This thesis first maps the relational computation onto Graphics Processing Units (GPU)s by designing a series of tools and then explores the different opportunities of reducing the limitation brought by the memory hierarchy across the CPU and GPU system. First, a complete end-to-end compiler and runtime infrastructure, Red Fox, is proposed. The evaluation on the full set of industry standard TPC-H queries on a single node GPU shows on average Red Fox is 11.20x faster compared with a commercial database system on a state of art CPU machine. Second, a new compiler technique called kernel fusion is designed to fuse the code bodies of several relational operators to reduce data movement. Third, a multi-predicate join algorithm is designed for GPUs which can provide much better performance and be used with more flexibility compared with kernel fusion. Fourth, the GPU optimized multi-predicate join is integrated into a multi-threaded CPU database runtime system that supports out-of-core
data set to solve real world problem. This thesis presents key insights, lessons learned, measurements from the implementations, and opportunities for further improvements.
Congratulations to William Song on successfully defending his thesis “Managing Lifetime Reliability, Performance, and Power Tradeoffs in Multicore Microarchitectures”!
William Song successfully defended his thesis “Managing Lifetime Reliability, Performance, and Power Tradeoffs in Multicore Microarchitectures” on Oct 29, 2015. Congratulations Dr. Song!
The objective of this research is to characterize and manage lifetime reliabil- ity, microarchitectural performance, and power tradeoffs in multicore processors. This dissertation is comprised of three research themes; 1) modeling and simulation method of interacting multicore processor physics, 2) characterization and management of perfor- mance and lifetime reliability tradeoff, and 3) extending Amdahl’s Law for understanding lifetime reliability, performance, and energy efficiency of heterogeneous processors. With continued technology scaling, processor operations are increasingly dominated by multiple distinct physical phenomena and their coupled interactions. Understanding these behaviors requires the modeling of complex physical interactions. This dissertation first presents a novel simulation framework that orchestrates interactions between multiple physical mod- els and microarchitecture simulators to enable research explorations at the intersection of application, microarchitecture, energy, power, thermal, and reliability. Using this frame- work, workload-induced variation of device degradation is characterized, and its impacts on processor lifetime and performance are analyzed. This research introduces a new met- ric to quantify performance-reliability tradeoff. Lastly, the theoretical models of hetero- geneous multicore processors are proposed for understanding performance, energy effi- ciency, and lifetime reliability consequences. It is shown that these system metrics are gov- erned by Amdahl’s Law and correlated as a function of processor composition, scheduling method, and Amdahl’s scaling factor. This dissertation highlights the importance of multi- dimensional analysis and extends the scope of microarchitectural studies by incorporating the physical aspects of processor operations and designs.
Congratulations to Naila Farooqui on successfully defending her thesis “Runtime Specialization for Heterogeneous CPU-GPU Platforms”!
Naila Farooqui successfully defended her thesis on Oct 19, 2015. Congratulations Dr. Farooqui!
The abstract of Naila’s thesis goes as follows:
Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drive the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings.
The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research will: (a) enable dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverage such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automate some of the programmer effort required to exercise specialized architectural features of such platforms via
instrumentation-driven dynamic code optimizations; and (d) propose a specialized, affinity-aware work-stealing scheduler for integrated CPU-GPU processors that efficiently distributes work at runtime across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices.
Congratulations to Indrani Paul on successfully defending her thesis “Cooperative Power Management in Heterogeneous Processors”!
Indrani Paul successfully defended her thesis on Mar 23, 2015. Congratulations Dr. Paul!
The high-level contributions of Indrani’s thesis “Coordinated Power Management in Heterogeneous Processors” are i) in-depth examination of characteristics and performance demands of emerging applications using hardware measurements and analysis from state-of-the-art heterogeneous processors and high-performance GPUs, ii) analysis of the effects of processor physics such as power and thermals on system level performance, iii) identification of a key set of run-time metrics that can be used to manage these effects, and iv) development and detailed evaluation of online coordinated power management techniques to optimize system level global metrics in heterogeneous CPU-GPU-memory processors.
Congratulations to Indrani and Jin for their papers accepted by ISCA 2015!
Indrani’s paper “Harmonia: Balancing Compute and Memory Power in High Performance GPUs”, coauthored with Wei N. Huang, Manish Arora and Sudhakar Yalamanchili, addresses the problem of efficiently managing the relative power demands of a high performance GPU and its memory subsystem. This work was collaborated with AMD Research.
Jin’s paper “Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support
Irregular Applications on GPUs”, coauthored with Norm Rubin, Albert Sidelnik and Sudhakar Yalamanchili, proposes a new mechanism to extend the current bulk synchronous parallel model underlying the current GPU execution model by supporting dynamic spawning of lightweight thread blocks. This work was collaborated with NVIDIA Research.
The paper “A Scalable Design Methodology for Energy Minimization of STTRAM: A Circuit and Architecture Perspective” Selected as the 2014 IEEE Circuits and Systems Society Very Large Scale Integrated Systems Best Paper
Professors Sudhakar Yalamanchili and Saibal Mukhopadhyay and their recently graduated students, Subho Chatterjee and Mitchelle Rasquinha, received the 2014 IEEE Circuits and Systems Society Very Large Scale Integrated Systems Best Paper Award for their paper “Scalable Design Methodology for Energy Minimization of STTRAM: A Circuit and Architecture Perspective”.
Spin-Torque-Transfer RAM (STTRAM) is an emerging non-volatile memory technology that can retain information with practically no energy loss; it has the potential to dramatically transform the energy landscape of future computing systems. However, to realize the energy-efficiency potential of STTRAM in designing energy-efficient processing architectures, the interactions between the unique device physics of STTRAM, processor architecture, and the applications must be understood. This paper presents a modeling and analysis framework that can be used to understand these interactions, particularly from an energy perspective. The framework uses this understanding to explore the circuit-architecture design space of this emerging memory technology for the design of energy-efficient memory hierarchies in modern processors. The work presented in this paper was supported by the National Science Foundation and Intel Corporation.
Congratulations to all the authors!
Paper “Power Multiplexing for Thermal Field Management in Many-Core Processors” Selected as the IEEE Transactions on Components, Packaging, and Manufacturing Technology 2013 Best Paper
The paper “Power Multiplexing for Thermal Field Management in Many-Core Processors” was awarded the best publication in the Components: Characterization and Modeling category. Coauthors are ECE Professors Saibal Mukhopadhyay and Sudhakar Yalamanchili; their graduated ECE Ph.D. students Minki Cho, Chad Kersey, and Nikhil Sathe; ME Assistant Professor Satish Kumar; and Man Prakash Gupta, Kumar’s current ME Ph.D. student.
This team’s paper presented a simple, yet effective approach known as power multiplexing, which periodically migrates the locations of active cores on a chip to redistribute the generated heat. Such spatiotemporal redistribution of power and heat reduces the peak temperature and produces a more uniform thermal field, thus mitigating the negative impact of peak temperatures and thermal gradients on performance and reliability.
Supported by the Semiconductor Research Corporation, Intel Corporation, and an IBM Faculty Award, this work reflects the multidisciplinary approach necessary to solving many critical problems facing the chip industry and demonstrates Georgia Tech’s longstanding commitment to collaborative research.
Congratulations to the team!