SHF:Small: Collaborative Research: Tailoring Memory Systems for Data-Intensive HPC Applications

Sponsored by NSF CCF

Abstract:

High-performance computing is of strategic importance for computational science and engineering in the United States, and is on an accelerated path to sustaining scientific discovery at much increased flops. To advance the scientific discovery to the next level and allow new science missions to be accomplished in a timely manner, it is critical to address the memory performance holistically in high-performance computing platforms. This project provides architectural and system support and optimization for building memory systems tailored for data-intensive applications, e.g., big data analytics. This project offers research and educational opportunities for both undergraduate and graduate students, and trains a new generation of computer scientists and engineers in the area of high-performance computing.

The objective of this project is to address the research challenges in building an efficient memory system by designing new techniques from several aspects. It develops a novel centralized memory refresh scheme at the cluster-level to manage memory refresh overhead, which has been increasingly performance-impacting and energy-consuming. It designs a new memory scheduling policy, taking advantages of new memory characteristics. It makes memory characteristics/peculiarities be available to the processor and operating system, so that they can make well-informed decisions to fully exploit memory performance potentials. It leverages in-memory computing to enable efficient in-situ processing. The integration of all these techniques provides a holistic solution to building an efficient memory system tailored for data-intensive applications.

Personnel

- Principal Investigators

- Graduate Students

Recent Publications

  1. *Wenjie Liu, Ke Zhou, *Ping Huang, Tianming Yang, and Xubin He, "RBC: A Memory Architecture for Improved Performance and Energy Efficiency", TST, Vol. 26, No. 3, June 2021, https://doi.org/10.26599/TST.2019.9010077.
  2. *Wenjie Liu, *Ping Huang, and Xubin He, “StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns”, Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, May 2020 (acceptance rate: 110/446=24.7%).
  3. J. Zhang, K. Zhou, *P. Huang, X. He, Z. Xiao, B. Cheng, Y. Ji, and Y. Wang, “Transfer Learning based Failure Prediction for Minority Disks in Large Data Centers of Heterogeneous Disk Systems,” Proceedings of the 48th International Conference on Parallel Processing (ICPP) , Kyoto, Japan, August 5-8, 2019 (acceptance rate: 106/405=26.2%).
  4. Weichen Huang, Juntao Fang, Shenggang Wan, Changsheng Xie, and Xubin He, “Design and Evaluation of a Risk-Aware Failure Identification Scheme for Improved RAS in Erasure-coded Data Centers”, IEEE Transactions on Parallel and Distributed Systems (TPDS), July 2020, accepted.
  5. Jiangkun Hu, Youmin Chen, Youyou Lu, Xubin He, and Jiwu Shu, “Understanding and analysis of B+ trees on NVM towards consistency and efficiency”, CCF Transactions on High Performance Computing, Vol. 2, No. 1, March 2020. Pp 36-49, https://doi.org/10.1007/s42514-020-00022-z.
  6. *T. Yao, Z. Tan, J. Wan, *P. Huang, Y. Zhang, Z. Tan, C. Xie, and X. He, “A Set-aware Key-Value Store on Shingled Magnetic Recording Drives with Dynamic Band”, Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, Canada, May 2018 (acceptance rate: 24.5%=113/461).
  7. *Y. Guo, Q. Liu, W. Xiao, *P. Huang, N. Podhorszki, S. Klasky, and X. He, “SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-stacked DRAM as Part of Memory”, Proceedings of the IEEE 25th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2017), Banff, Canada, September 20-22, 2017 (acceptance rate: 26/84=31%).
  8. *Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Norbert Podhorszki, Scott Klasky, Matthew Wolf, and *Tong Liu, "Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data", Procedings of the 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS), Vancouver, Canada, May 21-25, 2018.

Sponsor