Congratulations to Chad for successfully defending this PhD proposal titled “Accelerator Architecture Modeling with a Pipeline-Oriented Hardware Description Language” !
By exploiting the equivalence between multithreaded software and pipelined hardware, we can quickly construct, model, and analyze a range of both fixed function and instruction set accelerators well-suited to the energy constraints of modern architectures. This is reached by (1) realization of a domain specific language that provides for the high-productivity, high-performance modeling of pipelined accelerators by exploiting the equivalence of these accelerators with multithreaded software execution, (2) implementation of a range of fixed-function and general purpose accelerators, (3) automatically-generated area, energy, and fault models of these accelerators, and (4) evaluation of these accelerators in the context of near-memory processing.
Congratulations to Karthik for successfully defending his PhD proposal titled “Control Theoretic Approaches for the Coordinated Management of Heterogeneous Components in IoT Devices” !
The objective of the proposed research is to apply control theoretic techniques to IoT devices with diverse heterogeneous components. The availability of smart mobile devices based on low-power System-on-Chips (SoC), together with cloud based services, has enabled the emergence of IoT as the next big technological revolution. This research work focuses on balancing performance and energy consumption of SoCs subject to thermal constraints. To that end, this thesis addresses the following: (1) Characterization of power, performance and energy consumption of different policies implemented at various levels of the hardware and software stack of the SoC and identifying areas of improvement, (2) A control theoretic solution to coordinated management of core and memory power consumption to minimize energy consumption for a target performance level, (3) A distributed feedback controller to regulate core temperatures in a multicore processor, (4) Extension of the distributed coordinated control framework for thermal management in 3D stacked memories, and (5) Extension of the coordinated control framework to optimize performance and energy consumption in processor-in-memory architectures. The techniques developed in this research are generic enough to accommodate a wide variety of IoT devices.
Congratulations to Eric Anger on successfully defending his thesis titled, “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications”.
Eric Anger successfully defended his PhD dissertation titled, “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications” on Nov 9, 2016. Congratulations Dr. Anger!
Abstract: The objective of the proposed research is to create a methodology for the modeling and characterization of extreme-scale applications operating within power limitations in order to guide optimization. It is likely that forthcoming high-performance machines will operate with stringent power caps, tying the performance of the systems to their energy-efficiency. Optimizing extreme-scale applications to operate within power limitations will require new techniques for understanding the relationships between application characterization, performance, and energy. The main contributions of this work are: 1) a methodology for the time and energy modeling of high-performance computing applications that can scale to a large number of nodes, 2) characterization of the different ways time and energy are affected by degree of parallelism and processor clock frequency, and 3) optimization of performance under a power cap when scheduling applications, both bulk-synchronous and data-parallel task-based application models.
Congratulations to Jin Wang on successfully defending her PhD thesis titled “Acceleration and Optimization of Dynamic Parallelism for Irregular Applications on GPUs”!
Jin Wang successfully defended her thesis titled “Acceleration and Optimization of Dynamic Parallelism for Irregular Applications on GPUs” on Nov 7, 2016. Congratulations Dr. Wang!!!
Abstract: The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism occurring in the irregular data intensive applications. These dynamically formed pockets of structured parallelism can utilize the recently introduced device-side nested kernel launch capabilities on GPUs. However, the low utilization of GPU resources and the high cost of the device kernel launch make it still difficult to harness dynamic parallelism on GPUs. This thesis then presents an extension to the common Bulk Synchronous Parallel(BSP) GPU execution model – Dynamic Thread Block Launch (DTBL), which provides the capability of spawning light-weight thread blocks from GPU threads on demand and coalescing them to existing native executing kernels. The finer granularity of a thread block provides effective and efficient control of smaller-scale, dynamically occurring nested pockets of structured parallelism during the computation. Evaluations of DTBL shows an average of 1.21x speedup over the baseline implementations. The thesis proposes two classes of optimizations of this model. The first is a thread block scheduling strategy that exploits spatial and temporal reference locality between parent kernels and dynamically launched child kernels. The locality-aware thread block scheduler is able to achieve another 27% increase in the overall performance. The second is an energy efficiency optimization which utilizes the SMX occupancy bubbles during the execution of a DTBL application and converts them to SMX idle period where a flexible DVFS technique can be applied to reduce the dynamic and leakage power to achieve better energy efficiency. By presenting the implementation, measurements and key insights, this thesis takes a step in addressing the challenges and issues in emerging irregular applications.
Eric’s paper “Power-Constrained Performance Scheduling of Data Parallel Tasks,” co-authored with Jeremiah Wilke, and Sudhakar Yalamanchili was accepted in Energy Efficient Supercomputing Workshop (E2SC), 2016. Congratulations!!!
Karthik’s paper “Application-Specific Performance-Aware Energy Optimization on Android Mobile Devices”, co-authored with Jun Wang, Sudhakar Yalamanchili, Yorai Wardi and Handong Ye was accepted by HPCA 2017. Congratulations!!!
Congratulations to the GREEN lab team and Dr. Yalamanchili!
Find the news article here: http://www.nextplatform.com/2016/09/12/deep-learning-architectures-hinge-hybrid-memory-cube/
The ISCA paper abstract:
This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.
Congratulations to Eric Anger on successfully defending his PhD proposal “Application-level Modeling and Analysis of Time and Energy for Optimizing Power-constrained Extreme-scale Applications”
The objective of the proposed research is to create a methodology for the modeling and characterization of extreme-scale applications operating within power limitations in order to guide optimization. It is likely that forthcoming high-performance machines will operate with stringent power caps, tying the performance of the systems to their energy-efficiency. Optimizing extreme-scale applications to operate within power limitations will require new techniques for understanding the relationships between application characterization, performance, and energy. The main contributions of this work are: 1) a methodology for the time and energy modeling of high-performance computing applications that can scale to a large number of nodes, 2) characterization of the different ways time and energy are affected by degree of parallelism and processor clock frequency, and 3) optimization of performance under a power cap when scheduling applications, both bulk-synchronous and data-parallel task-based application models.
Congratulations to Minhaj Hassan on successfully defending his PhD thesis “Exploiting On-Chip Memory Concurrency in 3D Manycore Architectures”
Many congratulations Dr. Hassan!
The objective of this thesis is to optimize the uncore of 3D many-core architectures. More specifically, we note that technology trends point to large increases in memory-level concurrency. This in turn affects the design of the multi-core interconnect and organization of the memory hierarchy. The work addresses the need for re-optimization in the presence of this increase in concurrency of the memory system.
First, we observe that 2D network latency and inefficient parallelism management in the current 3D designs are the main bottlenecks to fully exploit the potentials of 3D. To that end, we propose an extremely low-latency, low-power, high-radix router and present its various versions for different network typologies and configurations. We also explore optimizations and techniques to reduce the traffic in the network. Second, we propose a reorganization of the memory hierarchy and use simple address space translations to regulate locality, bandwidth and energy trade-offs in highly concurrent 3D memory systems. Third, we analyze the rise in temperature of 3D memories and propose variable-rate per-bank refresh management that exploits variability in temperature to reduce 3D DRAM’s refresh power and extend its operating range to higher temperatures.
Congratulations to Si Li, William Song and Minhaj Hassan for their papers accepted by IEEE International Reliability Physics Symposium!
Paper “Software-based Dynamic Reliability Management for GPU Applications”, co-authored by Si Li, Vilas Sridharan, Sudhanva Gurumurthi and Sudhakar Yalamanchili, has been accepted by IEEE International Reliability Physics Symposium. Congratulations to Si!
Paper “Reliability-Performance Tradeoff between 2.5D and 3D-Stacked DRAM Processors”, co-authored by William J. Song, Syed Minhaj Hassan, Saibal Mukhopadhyay and Sudhakar Yalamanchili, has been accepted by IEEE International Reliability Physics Symposium as a short paper/poster. Congratulations to William and Minhaj!
The conference was held in April 2016.