Congratulations to Karthik Rao for successfully defending his PhD thesis titled “Coordinated Management of the Processor and Memory for Optimizing Energy Efficiency” !
Karthik Rao successfully defended his PhD thesis titled “Coordinated Management of the Processor and Memory for Optimizing Energy Efficiency” on 14 May, 2018. We wish him the best of luck for his future! Congratulations Dr. Karthik Rao!
Abstract: Energy efficiency is a key design goal for future computing systems. With diverse components interacting with each other on the System-on-Chip (SoC), dynamically managing performance, energy and temperature is a challenge in 2D architectures and more so in a 3D stacked environment. In addition, temperature has emerged as the parameter of primary concern. Heuristics based schemes have been employed so far to address these issues. Looking ahead into the future, complex multiphysics interactions between performance, energy and temperature reveal the limitations of such approaches. Therefore in this thesis, first, a comprehensive characterization of existing methods is carried out to identify causes for their inefficiency. Managing different components in an independent and isolated fashion using heuristics is seen to be the primary drawback. Following this, techniques based on feedback control theory to optimize energy efficiency of the processor and memory in a coordinated fashion are developed. They are evaluated on a real physical system and a cycle-level simulator demonstrating significant improvements over prior schemes. The two main messages of this thesis are, (i) coordination between multiple components is paramount for next generation computing systems and (ii) temperature ought to be treated as a resource like compute or memory cycles.
Congratulations to Xinwei Chen for successfully defending her PhD thesis titled “Performance and Power Management for Multi-core Processors” !
Xinwei Chen successfully defended her PhD thesis titled “Performance and Power Management for Multi-core Processors” on 4 April, 2018. We wish her the best of luck for her future! Congratulations Dr.Xinwei Chen!
Abstract: This dissertation addresses the problem of power and performance management for various computing systems, from single voltage island multicore processors to powerconstrained extreme scale cloud systems. Balancing power and performance in modern computing systems is a complex optimization problem. This challenge is addressed by the statement of this thesis: Improving performance and power consumption in modern computing systems will require new techniques, and the body of control theories can provide the basis for such solutions. This thesis addresses this problem through three main contributions: 1. Effective and efficient power & performance management techniques in a single voltage island multi-core processor. 2. Maximizing power efficiency under a power cap in a multi-core processor that is composed of several voltage islands. 3. A hierarchical power management technique to improve performance and energy efficiency under power budgets in a cloud system.
Congratulations to He Xiao for successfully defending his PhD thesis titled “A Multi-physics Approach to the Co-design of 3D Multi-core Processors” !
He Xiao successfully defended his PhD thesis titled “A Multi-physics Approach to the Co-design of 3D Multi-core Processors” on 11 January, 2018. We wish him the best of luck for his future! Congratulations Dr. He Xiao!
Abstract: The three-dimensional integrated circuit (3D IC) is a promising solution for processors in the post-Moore era. The 3D integration stacks multiple dies vertically in a single package and enables high integration density. Compared to 2D planar design, 3D ICs shorten the die-to-die distance and substantially increase inter-die communication bandwidth, providing a potential performance boost for large-scale systems. However, the design of 3D multi-core processors exposes great challenges to computer architect. First, 3D ICs create design bottlenecks in thermal management and power delivery. Second, the strong coupling relationship between performance, power, and thermal in 3D ICs adds design complexities. These factors require a holistic view to design high-performance energy-efficient 3D processors.
The purpose of the dissertation is to promote a multi-physics co-design methodology in 3D processors that achieves performance gain and energy efficiency. Towards this goal, this dissertation explores the co-design opportunities in a 3D multi-core processor from three perspectives. The first looks into a thermal-architecture co-design, which co-optimizes advanced microfluidic cooling with processor floorplan and power map and develops two high-performance thermally-adaptive processor designs for a given thermal cap. The second focuses on a power-architecture co-design, which minimizes the voltage guardband based on the thermal characterization in SRAM cache and architectural-level prediction for power reduction. The last works on a package-architecture co-design, which designs a Short-Stack structure that mitigates the pin stress problem and proposes a thread scheduling policy for heterogeneous architecture in a 3D package to maximize system performance for a given thermal cap. By evaluating the effectiveness of these approaches, the dissertation underscores the value of the multi-physics co-design as an integral part of future processor design.
Congratulations to Chad for successfully defending his PhD proposal titled “Accelerator Architecture Modeling with a Pipeline-Oriented Hardware Description Language” !
By exploiting the equivalence between multithreaded software and pipelined hardware, we can quickly construct, model, and analyze a range of both fixed function and instruction set accelerators well-suited to the energy constraints of modern architectures. This is reached by (1) realization of a domain specific language that provides for the high-productivity, high-performance modeling of pipelined accelerators by exploiting the equivalence of these accelerators with multithreaded software execution, (2) implementation of a range of fixed-function and general purpose accelerators, (3) automatically-generated area, energy, and fault models of these accelerators, and (4) evaluation of these accelerators in the context of near-memory processing.
Congratulations to Karthik for successfully defending his PhD proposal titled “Control Theoretic Approaches for the Coordinated Management of Heterogeneous Components in IoT Devices” !
The objective of the proposed research is to apply control theoretic techniques to IoT devices with diverse heterogeneous components. The availability of smart mobile devices based on low-power System-on-Chips (SoC), together with cloud based services, has enabled the emergence of IoT as the next big technological revolution. This research work focuses on balancing performance and energy consumption of SoCs subject to thermal constraints. To that end, this thesis addresses the following: (1) Characterization of power, performance and energy consumption of different policies implemented at various levels of the hardware and software stack of the SoC and identifying areas of improvement, (2) A control theoretic solution to coordinated management of core and memory power consumption to minimize energy consumption for a target performance level, (3) A distributed feedback controller to regulate core temperatures in a multicore processor, (4) Extension of the distributed coordinated control framework for thermal management in 3D stacked memories, and (5) Extension of the coordinated control framework to optimize performance and energy consumption in processor-in-memory architectures. The techniques developed in this research are generic enough to accommodate a wide variety of IoT devices.
Congratulations to Eric Anger on successfully defending his thesis titled, “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications”.
Eric Anger successfully defended his PhD dissertation titled, “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications” on Nov 9, 2016. Congratulations Dr. Anger!
Abstract: The objective of the proposed research is to create a methodology for the modeling and characterization of extreme-scale applications operating within power limitations in order to guide optimization. It is likely that forthcoming high-performance machines will operate with stringent power caps, tying the performance of the systems to their energy-efficiency. Optimizing extreme-scale applications to operate within power limitations will require new techniques for understanding the relationships between application characterization, performance, and energy. The main contributions of this work are: 1) a methodology for the time and energy modeling of high-performance computing applications that can scale to a large number of nodes, 2) characterization of the different ways time and energy are affected by degree of parallelism and processor clock frequency, and 3) optimization of performance under a power cap when scheduling applications, both bulk-synchronous and data-parallel task-based application models.
Congratulations to Jin Wang on successfully defending her PhD thesis titled “Acceleration and Optimization of Dynamic Parallelism for Irregular Applications on GPUs”!
Jin Wang successfully defended her thesis titled “Acceleration and Optimization of Dynamic Parallelism for Irregular Applications on GPUs” on Nov 7, 2016. Congratulations Dr. Wang!!!
Abstract: The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism occurring in the irregular data intensive applications. These dynamically formed pockets of structured parallelism can utilize the recently introduced device-side nested kernel launch capabilities on GPUs. However, the low utilization of GPU resources and the high cost of the device kernel launch make it still difficult to harness dynamic parallelism on GPUs. This thesis then presents an extension to the common Bulk Synchronous Parallel(BSP) GPU execution model – Dynamic Thread Block Launch (DTBL), which provides the capability of spawning light-weight thread blocks from GPU threads on demand and coalescing them to existing native executing kernels. The finer granularity of a thread block provides effective and efficient control of smaller-scale, dynamically occurring nested pockets of structured parallelism during the computation. Evaluations of DTBL shows an average of 1.21x speedup over the baseline implementations. The thesis proposes two classes of optimizations of this model. The first is a thread block scheduling strategy that exploits spatial and temporal reference locality between parent kernels and dynamically launched child kernels. The locality-aware thread block scheduler is able to achieve another 27% increase in the overall performance. The second is an energy efficiency optimization which utilizes the SMX occupancy bubbles during the execution of a DTBL application and converts them to SMX idle period where a flexible DVFS technique can be applied to reduce the dynamic and leakage power to achieve better energy efficiency. By presenting the implementation, measurements and key insights, this thesis takes a step in addressing the challenges and issues in emerging irregular applications.
Congratulations to Eric Anger on getting his paper accepted in E2SC 2016!
Eric’s paper “Power-Constrained Performance Scheduling of Data Parallel Tasks,” co-authored with Jeremiah Wilke, and Sudhakar Yalamanchili was accepted in Energy Efficient Supercomputing Workshop (E2SC), 2016. Congratulations!!!
Congratulations to Karthik on getting his paper accepted at HPCA 2017!
Karthik’s paper “Application-Specific Performance-Aware Energy Optimization on Android Mobile Devices”, co-authored with Jun Wang, Sudhakar Yalamanchili, Yorai Wardi and Handong Ye was accepted by HPCA 2017. Congratulations!!!
Neurocube (ISCA 2016 paper) makes the news!
Congratulations to the GREEN lab team and Dr. Yalamanchili!
Find the news article here: http://www.nextplatform.com/2016/09/12/deep-learning-architectures-hinge-hybrid-memory-cube/
The ISCA paper abstract:
This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.