SHF:Small: Collaborative Research: Understanding, Modeling, and System Support for HPC Data Reduction

Sponsored by NSF CCF

Abstract:

High-performance computing (HPC) enables high fidelity science missions that capture microscopic phenomena that were impossible to study in the past. In order to allow science missions to be accomplished in a timely manner, it is critical to manage the massive datasets that HPC generates in efficient way so that the time to knowledge can be shortened. This project aims to understand the role and usage of data reduction in large computational applications. Research and educational opportunities are provided to train a new generation of computer scientists and engineers, particularly those under-represented groups, to ensure the U.S. competitiveness in high-performance computing.

The goal of this project is to address a number of critical gaps in using data reduction for HPC-based science missions. In particular, 1) the impact of reduction error on scientific discovery is mathematically and experimentally studied; 2) analytical models are formulated to estimate the reduction performance, without forcing users to compress the full data. For data-intensive applications, having this capability is important so that domain scientists do not have to go through the cumbersome trial-and-error process to figure out what reduction can offer; 3) the project provides potentially more efficient data analysis and reduction capabilities for exascale computing. The integrated research activities in this NSF project will significantly improve the understanding and usage of data reduction on future systems.

Personnel

- Principal Investigators

- Graduate Students

- Undergraduate Student

Recent Publications

  1. *Tong Liu, *Shakeel Alibhai, Jinzhen Wang, Qing Liu and Xubin He, “Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning”, Proc. Of the 15th IEEE International Conference on Networking, Architecture, and Storage (NAS), Riverside, CA, October 24-26, 2021.
  2. *Tong Liu, Jinzhen Wang, Qing Liu, *Shakeel Alibhai, Tao Lu, Xubin He, “High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data,” IEEE Transactions on Big Data, 2021. DOI: 10.1109/TBDATA.2021.3066151.
  3. *Tong Liu, *Shakeel Alibhai, and Xubin He, “A Rack-aware Pipeline Repair Scheme for Erasure-coded Distributed Storage Systems,” Proceedings of the 49th International Conference on Parallel Processing (ICPP), August 17-20, 2020.
  4. Han Qiu, *Chentao Wu, Jie Li,  Minyi Guo, Tong Liu, Xubin He, Yuanyuan Dong, and Yafei Zhao, “EC-Fusion: An Efficient Hybrid Erasure Coding Framework to Improve Both Application and Recovery Performance in Cloud Storage Systems”, Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, May 2020 (acceptance rate: 110/446=24.7%).  
  5. *T. Liu, *S. Alibhai, J. Wang, Q. Liu, X. He, and  *C. Wu, “Exploring Transfer Learning to Reduce Training Overhead of HPC Data in Machine Learning”, Proc. Of the 14th IEEE International Conference on Networking, Architecture, and Storage (NAS), August 15-17, 2019.  
  6. Jinzhen Wang, *Tong Liu, Qing Liu, Xubin He, Huizhang Luo, Weiming He, “Compression Ratio Modeling and Estimation Across Error Bounds for Lossy Compression,” IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 31, No. 7, 2020.
  7. *Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Norbert Podhorszki, Scott Klasky, Matthew Wolf, and *Tong Liu, "Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data", Procedings of the 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS), Vancouver, Canada, May 21-25, 2018.

Thesis/Dissertation

[PhD] Tong Liu, "Efficient Data Reduction in HPC and Distributed Storage Systems", Date Graduated: Summer 2021. First employment after graduation: Marvell Technology Group, Boston, MA.

Sponsor