TY - GEN
T1 - Clustering-Based Numerosity Reduction for Cloud Workload Forecasting
AU - Rossi, Andrea
AU - Visentin, Andrea
AU - Prestwich, Steven
AU - Brown, Kenneth N.
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2024
Y1 - 2024
N2 - Finding smaller versions of large datasets that preserve the same characteristics as the original ones is becoming a central problem in Machine Learning, especially when computational resources are limited, and there is a need to reduce energy consumption. In this paper, we apply clustering techniques for wisely selecting a subset of datasets for training models for time series prediction of future workload in cloud computing. We train Bayesian Neural Networks (BNNs) and state-of-the-art probabilistic models to predict machine-level future resource demand distribution and evaluate them on unseen data from virtual machines in the Google Cloud data centre. Experiments show that selecting the training data via clustering approaches such as Self Organising Maps allows the model to achieve the same accuracy in less than half the time, requiring less than half the datasets rather than selecting more data at random. Moreover, BNNs can capture uncertainty aspects that can better inform scheduling decisions, which state-of-the-art time series forecasting methods cannot do. All the considered models achieve prediction time performance suitable for real-world scenarios.
AB - Finding smaller versions of large datasets that preserve the same characteristics as the original ones is becoming a central problem in Machine Learning, especially when computational resources are limited, and there is a need to reduce energy consumption. In this paper, we apply clustering techniques for wisely selecting a subset of datasets for training models for time series prediction of future workload in cloud computing. We train Bayesian Neural Networks (BNNs) and state-of-the-art probabilistic models to predict machine-level future resource demand distribution and evaluate them on unseen data from virtual machines in the Google Cloud data centre. Experiments show that selecting the training data via clustering approaches such as Self Organising Maps allows the model to achieve the same accuracy in less than half the time, requiring less than half the datasets rather than selecting more data at random. Moreover, BNNs can capture uncertainty aspects that can better inform scheduling decisions, which state-of-the-art time series forecasting methods cannot do. All the considered models achieve prediction time performance suitable for real-world scenarios.
KW - Bayesian Neural Network
KW - Cloud Computing
KW - Clustering
KW - Deep Learning
KW - Workload Prediction
UR - https://www.scopus.com/pages/publications/85180533215
U2 - 10.1007/978-3-031-49361-4_7
DO - 10.1007/978-3-031-49361-4_7
M3 - Conference proceeding
AN - SCOPUS:85180533215
SN - 9783031493607
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 115
EP - 132
BT - Algorithmic Aspects of Cloud Computing - 8th International Symposium, ALGOCLOUD 2023, Revised Selected Papers
A2 - Chatzigiannakis, Ioannis
A2 - Karydis, Ioannis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 8th International Symposium on Algorithmic Aspects of Cloud Computing, ALGOCLOUD 2023
Y2 - 5 September 2023 through 5 September 2023
ER -