Enriching Word Vectors with Morphological Information

 Martin Mirakyan


M. Mirakyan and H. Khachatrian

YerevaNN Research Lab



This paper presents an end-to-end approach for word representation learning which takes into account the morphology of the language. The system consists of three parts: semantic analysis of a sentence, morpheme extraction from each word, and word-vector learning. The novelty of our approach is the linguistically correct morphological word features and the end-to-end pipeline for learning word vectors. Our method achieves state of the art performance on morpheme segmentation, while outperforms most of the solutions for lemmatization, part of speech (POS) tagging, and morphological feature extraction. Finally, we evaluate our approach on the obtained word embeddings and demonstrate that linguistically correct word features can lead to better word representations especially for rare words.


Discussion Room: Enriching Word Vectors with Morphological Information

[email protected]