Evgeny Matusov

Evgeny Matusov
Lead Research Scientist - Machine Translation
Biography

Evgeny received his diploma degree in computer science from RWTH Aachen University in 2003 and a PhD in Computer Science from Aachen University in 2009 where he was a research assistant from 2003 to 2009. His research focused on statistical machine translation, in particular translation of speech and combination of multiple translation systems. Evgeny authored of almost 30 peer-reviewed publications on various aspects of statistical machine translation.


From July 2009 Evgeny worked for the office of Apptek (Applications Technology, Inc.) in Aachen as Senior Machine Translation Researcher.
He started the efforts to create Apptek’s own MT decoder. He also trained MT systems "from scratch" for more than 20 language pairs within less than 2 years. At Apptek, as well as at SAIC (which owned Apptek’s MT technology from 2010 until October 2013) Evgeny was responsible for domain adaptation, further translation quality improvements, which included both language-independent and language-specific features. For his achievements, Evgeny was assigned the title of SAIC Technical Fellow in 2012.
Evgeny joined eBay in June 2014; since April 2015, he manages the applied science team focussing on machine translation. His research focus is on topic adaptation, as well as translation into morphologically rich languages.

Publications
ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, Sofia, Bulgaria: 4 – 9 August, 2013

Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features to improve the quality of phrase-pivot based SMT. The features, source connectivity strength and target connectivity strength reflect the quality of projected alignments between the source and target phrases in the pivot phrase table. We show positive results (0.6 BLEU points) on Persian-Arabic SMT as a case study.

 

Proceedings of the 6th International Joint Conference on Natural Language Processing

Selective Combination of Pivot and Direct Statistical Machine Translation Models

In this paper, we propose a selective combination approach of pivot and direct statistical machine translation (SMT) models to improve translation quality. We work with Persian-Arabic SMT as a case study. We show positive results (from 0.4 to 3.1 BLEU on different direct training corpus sizes) in addition to a large reduction of pivot translation model size.

Association for Machine Translation in the Americas (AMTA), Oct. 2016

Guided Alignment Training for Topic-Aware Neural Machine Translation

Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.

Copenhagen, Denmark, September 2017

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, Shahram Khadivi

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German->English news domain and English->Russian ecommerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.

Patents