We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
OKCM: improving parallel task scheduling in high-performance computing systems using online learning.
- Authors
Li, Jingbo; Zhang, Xingjun; Han, Li; Ji, Zeyu; Dong, Xiaoshe; Hu, Chenglong
- Abstract
Task scheduling is becoming increasingly important in large-scale high-performance computing real-time systems as the parallel scale, number and types of task continue to increase. The prioritizing policies and backfilling mechanisms are the most useful practice for improving scheduling performance. In particular, these methods highly depend on the task running time prediction. Previous studies focused on improving the running time prediction accuracy, resulting in higher time overhead and deployment difficulties in real-time scheduling system. In this paper, an efficient running time prediction model, referred to as online learning and K-nearest neighbors (KNN)-based predictor with correction mechanism (OKCM), is proposed. OKCM updates in real time through online algorithm and is friendly to users with a small data accumulation by KNN-based predictor. To evaluate our model, a trace-driven simulator, named HPCsim, is designed and implemented. The experimental results demonstrated that OKCM can achieve higher prediction accuracy with a low overhead. Furthermore, OKCM can achieve significant scheduling performance improvement and can be used to enhance primary prioritizing and backfilling methods without being restricted by specific scheduling method.
- Subjects
COMPUTER systems; ONLINE education; SCHEDULING; REAL-time computing; ONLINE algorithms
- Publication
Journal of Supercomputing, 2021, Vol 77, Issue 6, p5960
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-020-03506-5