Networks and Memory Systems


  • Minhaj Hassan
  • Andy Vanderhayden
  • Jeff Young
  • Sudhakar Yalamanchili



Networks and Memory Systems

The broad focus of this project is to investigate energy and performance efficient integration of interconnection networks and memory systems. On-chip this takes the form of understanding exploiting the relationships between address spaces, topology, routing, and switch & memory controller scheduling policies. The end-goal is to minimize latency and data movement.

At the system level, we advocate and explore the implications and implementation of a Global Address Space (GAS) model for the implementation of scalable cluster systems. We proposed a specific variant – Distributed PGAS (DPGAS). In particular, we are concerned about the portability of the model and software implementations across future generations of processors with increasing physical address ranges. A prototype implementation based on HT-Over-Ethernet (HToE) has been synthesized to an FPGA and we are active members of the HyperTransport Consortium. During the 2010 Server Design Summit, AMD released a HyperTransport over Ethernet specification as part of their HyperShare platform. Through collaboration with the HyperTransport Consortium, CASL assisted in developing this specification to support encapsulation of noncoherent HyperTransport traffic in Ethernet packets to be sent over 10, 40, or 100 Gbps Ethernet networks. For more information please check out the specification link and papers that helped influence the development of this specification below. Another specification that CASL assisted with, HyperTransport over InfiniBand, was released at the beginning of 2011.

Our global address space work has now moved to GPU clusters and data centers in collaboration with the University of Heidelberg. For more current work related to global address space support for data center applications, please see the Oncilla project. Oncilla use hardware support for GAS developed at the University of Heidelberg.


The papers are provided for personal use and are subject to copyright of the publishers.

  • J. Young, S. Shon, S. Yalamanchili, A. Merritt, K. Schwan, H. Fröning, Oncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters. IEEE Cluster, 2013. paper
  • J. Lee, S. Li, H. Kim, and S. Yalamanchili, “Adaptive Virtual Channel Partitioning for On-chip-Network in Heterogeneous Architectures”, to appear ACM Transactions on Design Automation of Electronic Systems. 2013.
  • S. Hassan and S. Yalamanchili, “Centralized Buffer Router: A Low Latency, Low Power Router for High Radix NoCs.” IEEE/ACM International Symposium on Networks on Chip (NOCS-2013). April 2013. paper
  • S. Yalamanchili, “New Rules: Sustaining Performance Through Extreme Scale.” Special Session on Emerging Interconnects Technologies at IEEE/ACM International Symposium on Networks on Chip (NOCS-2013). April 2013. slides
  • J. Young, A. Merritt, H. Wu, S. Yalamanchili. “Oncilla – A GAS Run-time for Efficient Resource Partitioning in Data Centers (Poster).” 2nd Annual Intel Science and Technology Center for Cloud Computing (ISTC-CC) Retreat. December 2012. poster
  • S. Yalamanchili. “Keynote: Scalable Resource Composition in a Flat World.” First International Workshop on Unconventional Cluster Architectures and Applications (UCAA), held with 41st International Conference on Parallel Processing (ICPP-2012). September 2012. slides
  • J. Young, S. Yalamanchili. “Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds” The 14th IEEE Conference on High Performance Computing and Communications (HPCC). June 2012. paper and slides
  • S. Yalamanchili, “Switching Techniques,” (invited) Encyclopedia of Parallel Computing, 2011.
  • S. Yalamanchili, “Interconnection Networks,” (invited) Encyclopedia of Parallel Computing, 2011.
  • S. M. Hassan, D. Choudhary, M. Rasquinha, and S. Yalamanchili, “Regulating Locality vs Parallelism Tradeoffs in a Multiple Memory Controller Environments,” IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, (short paper, poster) October 2011.
  • J. Young, S. Yalamanchili, B. Holden, M. Cavalli, P. Miranda. “HyperTransport Over Ethernet – A Scalable, Commodity Standard for Resource Sharing in the Data Center” Second International Workshop on HyperTransport Research and Applications (WHTRA). February 2011. paper and slides
  • J. Young and S. Yalamanchili. “Dynamic Partitioned Global Address Spaces for Power Efficient DRAM Virtualization.” IEEE Workshop on Work in Progress in Green Computing. August 2010. paper and slides
  • D. Lewis, S. Yalamanchili, and Hsien Hsin Lee. “High Performance Non blocking Switch Design in 3D Die Stacking Technology.” Proceedings of the IEEE Annual Symposium on VLSI. May 2009.
  • S. Yalamanchili. “Key Note: System Impact of Integrated Interconnects.” Second Symposium of the HyperTransport Center of Excellence. University of Hiedelberg, Mannheim, Germany, February 2009.
  • J. Duato, F. Silla, B. Holden, P. Miranda, J. Underhill, M. Cavalli, S. Yalamanchili and U. Bruning. “Extending HyperTransport Protocol for Improved Scalability.” irst Workshop on HyperTransport Research and Applications. February 2009.
  • J. Young, S. Yalamanchili, J. Duato, and F. Silla. “A HyperTransport-Enabled Global Memory Model For Improved Memory Efficiency.” First Workshop on HyperTransport Research and Applications. February 2009. paper and slides
  • S. Ramaswamy and S. Yalamanchili, “A Utilization Driven Framework for Energy Efficient Caches,” IEEE International Conference on High Performance Computing, December 2008.
  • K. Chuang, S. Yalamanchili, A. Gavrilovska, K. Schwan, “Sharestreams-V: A Virtualized, QoS Packet Scheduling Accelerator”, IEEE Symposium on Custom Computing Machines, April 2008.
  • S. Ramaswamy and S. Yalamanchili, “Improving Cache Efficiency via Resizing + Remapping,” Proceedings of the IEEE International Conference on Computer Design, October 2007.
  • G. Diamos, S. Yalamanchili. J. Duato. “STARS: A System for Tuning and Actively Reconfiguring Links.” Workshop on Diagnostic Services in Network on Chips, held with Design and Test Europe (Poster). April 2007.
  • S. Ramaswamy and S. Yalamanchili, “Customized Placement for Embedded Processor Caches,” Proceedings of Architecture of Computing Systems, March 2007.
  • S. Ramaswamy and S. Yalamanchili, “Customizable Fault Tolerant Caches for Embedded Processors,” Proceedings of the IEEE International Conference on Computer Design, October 2006.
  • R. Krishnamurthy, S. Yalamanchili, K. Schwan, and R. West, “Sharestreams: A Scalable Architecture and Hardware Support for High Speed QoS Packet Schedulers,” Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines, April 2004.


Industry Specifications


Book Chapters

  • Sudhakar Yalamanchili, Jeffrey Young. System Impact of Integrated Interconnects from Attaining High Performance Communications: A Vertical Approach. CRC Press. 2009.