Fingerprinting Data Center Problems with Association Rules

Ashot Harutyunyan


A. N. Harutyunyan, N. M. Grigoryan, and A. V. Poghosyan





Cloud management technologies increasingly automate different aspects of data center administration, where the final goal is to make self-driving solutions. Learning fingerprints of KPI- or SLO-impacting performance problems in IT infrastructures is a relevant task towards such a vision. Instead of defining problem types for data center components (resources/objects of various kinds) using do-main knowledge, which is hard to obtain and unreliable because of complexities and sophistication of modern cloud systems, we propose a ML framework to detect those issue categories. Then alerting engines can run on top of those patterns to notify the users on conditions that are impacting system’s KPIs thus providing explainability for troubleshooting and long-term performance optimization of the infrastructure. We consider several scenarios for learning problem definitions in terms of constructs by vRealize Operations – one of the leading solutions in the cloud management market. Using association rules mining concepts we can recommend problem patterns (fingerprints) in form of minimum size attribute combinations that constitute core structures highly associated with degradation of the KPI or SLO loss. We demonstrate experimental insights on virtualized environments applying our prototype algorithm.



Discussion Room: Fingerprinting Data Center Problems with Association Rules

[email protected]