Topic

Estimating Efficient Sampling Rates of Metrics for Training Accurate Machine Learning Models

 Tigran Bunarjyan

 

T. A. Bunarjyan, A. N. Harutyunyan, A. V. Poghosyan, A.J. Han Vinck, Y. Chen, and N. A.Hovhannisyan

VMware, Inc.

 

  Abstract:

 

 Cloud management solutions provide full real-time visibility into modern software-defined data centers (SDDC) of high complexity and sophistica-tion through measuring millions of indicators with increasingly high sam-pling rate. This high frequency monitoring of metrics allows capturing the expected ever-growing dynamism of business-critical applications resulting in huge bases of time series data to be stored for analysis, pattern detection, and training predictive/forecasting models. That causes high analytics over-head and product performance issues. Therefore, identifying optimal sam-pling rates of time series data subject to preserving their main information content could mitigate this issue. A particular use case is tuning the sam-pling rates to be efficient for training ML models accurate enough in analyt-ics tasks, such as anomaly detection. In this paper, we analyze a large collec-tion of cloud application metrics and show that the sampling rate can be substantially reduced with a small information divergence. Moreover, we show that those anomaly detection modules perform sufficiently/tolerably accurate for the reduced data sets.

 


 

Discussion Room: Estimating Efficient Sampling Rates of Metrics for Training Accurate Machine Learning Models

 

[email protected]