Estimating Efficient Sampling Rates of Metrics for Training Accurate Machine Learning Models - CODASSCA2020

Tigran Bunarjyan

T. A. Bunarjyan, A. N. Harutyunyan, A. V. Poghosyan, A.J. Han Vinck, Y. Chen, and N. A.Hovhannisyan

VMware, Inc.

Abstract:

Cloud management solutions provide full real-time visibility into modern software-defined data centers (SDDC) of high complexity and sophistica-tion through measuring millions of indicators with increasingly high sam-pling rate. This high frequency monitoring of metrics allows capturing the expected ever-growing dynamism of business-critical applications resulting in huge bases of time series data to be stored for analysis, pattern detection, and training predictive/forecasting models. That causes high analytics over-head and product performance issues. Therefore, identifying optimal sam-pling rates of time series data subject to preserving their main information content could mitigate this issue. A particular use case is tuning the sam-pling rates to be efficient for training ML models accurate enough in analyt-ics tasks, such as anomaly detection. In this paper, we analyze a large collec-tion of cloud application metrics and show that the sampling rate can be substantially reduced with a small information divergence. Moreover, we show that those anomaly detection modules perform sufficiently/tolerably accurate for the reduced data sets.

Discussion Room: Estimating Efficient Sampling Rates of Metrics for Training Accurate Machine Learning Models

[email protected]