Ashot Harutyunyan A. N. Harutyunyan, N. M. Grigoryan, A. V. Poghosyan, S. Dua, H.Antonyan, K. Aghajanyan, and B. Zhang VMware Abstract: Identifying actual root causes of a performance issue within a modern cloud infrastructure with high level of scale, sophistication, and complexity, is a hard task. It is especially complicated to diagnose a service or infrastructure degradation of an unknown nature, when no active alert is enough indicative about potential sources (be it an object, its metric, property, or an associated event) of the problem. In such a situation, the data center administration is intuitively looking for changes in the system that might reveal the causative…
-
-
Fingerprinting Data Center Problems with Association Rules
Ashot Harutyunyan A. N. Harutyunyan, N. M. Grigoryan, and A. V. Poghosyan VMware Abstract: Cloud management technologies increasingly automate different aspects of data center administration, where the final goal is to make self-driving solutions. Learning fingerprints of KPI- or SLO-impacting performance problems in IT infrastructures is a relevant task towards such a vision. Instead of defining problem types for data center components (resources/objects of various kinds) using do-main knowledge, which is hard to obtain and unreliable because of complexities and sophistication of modern cloud systems, we propose a ML framework to detect those issue categories. Then alerting engines can run on top of those patterns to notify…
-
Estimating Efficient Sampling Rates of Metrics for Training Accurate Machine Learning Models
Tigran Bunarjyan T. A. Bunarjyan, A. N. Harutyunyan, A. V. Poghosyan, A.J. Han Vinck, Y. Chen, and N. A.Hovhannisyan VMware, Inc. Abstract: Cloud management solutions provide full real-time visibility into modern software-defined data centers (SDDC) of high complexity and sophistica-tion through measuring millions of indicators with increasingly high sam-pling rate. This high frequency monitoring of metrics allows capturing the expected ever-growing dynamism of business-critical applications resulting in huge bases of time series data to be stored for analysis, pattern detection, and training predictive/forecasting models. That causes high analytics over-head and product performance issues. Therefore, identifying optimal sam-pling rates of time series data subject to preserving their…
-
On Machine Learning Powered Theorem Prover for Propositional Fragment of Minimal Logic
Ashot Baghdasaryan Ashot Baghdasaryan1 and Hovhannes Bolibekyan2 1Russian – Armenian University2Yerevan State University Abstract: There are three main problems for theorem proving with a standard cut-free system for the propositional fragment of minimal logic. The first problem is the possibility of looping. Secondly, it might generate proofs which are permutations of each other. Finally, during the proof some choice should be made to decide which rules to apply and where to use them. In order to solve the rule selection problem, recurrent neural networks are deployed and they are used to determine which formula from the context should be used on further steps. As a result, it…
-
A Survey on Deep Semi-Supervised Learning Algorithms
Ani Vanyan A. Vanyan and H. Khachatrian YerevaNN Research Lab Abstract: Semi-supervised learning is a branch of machine learning focused on improving the performance of models when the labeled data is scarce, but there is access to large number of unlabeled examples. Recently, there has been a remarkable process in designing algorithms which are able to get reasonable image classification accuracy having access to labels for only 0.5\% of the samples on relatively small datasets like CIFAR-10 and SVHN. The downside of these algorithms is that they require expensive tuning of hyperparameters for each dataset, and the hyperparameters tuned for one dataset do not generalize to others.…
-
Current Approaches and Challenges for the Two-Party Privacy-Preserving Record Linkage (PPRL)
Yanling Chen University of Duisburg-Essen Abstract: Integrating data from diverse sources with the aim to identify similar records that refer to the same real-world entities without compromising privacy of these entities is an emerging research problem in various domains. This problem is known as privacy preserving record linkage (PPRL). Despite the abundant number of literature on PPRL, a commonly accepted formal framework is still missing. In this paper, we focus on the two-party PPRL and provide an overview of the currently existing approaches and related works. Several desired properties of two-party PPRL are discussed, which may lay the foundation for a formal description of the process…
-
Excess-Risk Consistency of Group-Hard Thresholding Estimator in Robust Estimation of Gaussian Mean
Arshak Minasyan Department of Mathematics – Yerevan State University,YerevaNN Research Lab Abstract: In this work we introduce the notion of the excess risk in the setupof estimation of the Gaussian mean when the observations are corrupted by outliers.It is known that the sample mean loses its good properties in the presence ofoutliers \cite{huber1,huber2}. In addition, even the sample median isnot minimax-rate-optimal in the multivariate setting. The optimal rate of the minimaxrisk in this setting was established by \cite{chao_gao}. However, even theseminimax-rate-optimality results do not quantify how fast the risk in thecontaminated model approaches the risk in the uncontaminated model whenthe rate of contamination goes to zero. The…
-
On the Tradeoff Between Accuracy and Fairness in Representation Learning
Tigran Galstyan T. Galstyan and H. Khachatrian YerevaNN Research Lab Abstract: In many applications of machine learning, it is desirable to have models which not only have good accuracy on the prediction task but are also “fair” with respect to some protected variable. One approach to achieve fairness is to learn an invariant representation of the data with respect to that variable and then learn the predictor on top of the representation. Recently, an information-theoretic approach called DSF (Discovery and Separation of Features) was introduced, which demonstratedstrong results in cases where the label and the protected variable are independent. In this paper we extend the model to…
-
The Role of Alignment of Multilingual Contextualized Embeddings in Zeroshot Cross-Lingual Transfer for Event Extraction
Karen Hambardzumyan K. Hambardzumyan, H. Khachatrian, and J. May Abstract: Contextualized word embeddings like BERT enabled signifi- cant advances in many natural language processing tasks. Recently, mul- tilingual versions of such embeddings were trained on large text corpora of more than 100 languages. In this paper we investigate how well such embeddings perform in zero-shot cross lingual transfer for an event ex- traction task. In particular, we analyze the impact of the alignment of contextualized word embeddings using a parallel corpus on the perfor- mance of the downstream task. Discussion Room: The Role of Alignment of Multilingual Contextualized Embeddings in Zeroshot Cross-Lingual Transfer for Event…
-
Enriching Word Vectors with Morphological Information
Martin Mirakyan M. Mirakyan and H. Khachatrian YerevaNN Research Lab Abstract: This paper presents an end-to-end approach for word representation learning which takes into account the morphology of the language. The system consists of three parts: semantic analysis of a sentence, morpheme extraction from each word, and word-vector learning. The novelty of our approach is the linguistically correct morphological word features and the end-to-end pipeline for learning word vectors. Our method achieves state of the art performance on morpheme segmentation, while outperforms most of the solutions for lemmatization, part of speech (POS) tagging, and morphological feature extraction. Finally, we evaluate our approach on the obtained word embeddings…