Nicola Ueffing

Nicola Ueffing
Research Scientist
Biography

Nicola joined eBay's machine translation research team in May 2016. Prior to working for eBay, Nicola was a language modeling research scientist at Nuance Communications, leading the research and development for dictation products like Dragon NaturallySpeaking.


Nicola received a PhD in computer science from RWTH Aachen University, specializing in confidence estimation for machine translation. She then joined the Interactive Language Technologies team at the National Research Council Canada as PostDoc research associate. Her research interests include machine translation as well as most other areas of computational natural language processing.

Nicola has technical publications at top international conferences, such as ACL, Interspeech, CoLing, EMNLP, as well as recognized journals like Computational Linguistics and Discrete Mathematics.

Publications
MT Summit, Nagoya, Japan, September 2017

A detailed investigation of Bias Errors in Post-editing of MT output

Silvio Picinini, Nicola Ueffing

The use of post-editing of machine translation output is increasing throughout the language technology community. In this work, we investigate whether the MT system influences the human translator, thereby introducing "bias" and potentially leading to errors in the post-editing. We analyze how often a translator accepts an incorrect suggestion from the MT system and determine different types of bias errors. We carry out quantitative analysis on translations of eCommerce data from English into Portuguese, consisting of 713 segments with about 15k words. We observed a higher-than-expected number of bias errors, about 18 bias errors per 1,000 words. Among the most frequent types of bias error we observed ambiguous modifiers, terminology errors, polysemy, and omissions. The goal of this work is to provide quantitative data about bias errors in post-editing that help indicate the existence of bias. We explore some ideas on how to automate the finding of these error patterns and facilitate the quality assurance of post-editing.

International Conference on Natural Language Generation, Santiago de Compostela, Spain, September 2017

Generating titles for millions of browse pages on an e-Commerce site

We present three approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German, we have access to a large amount of rule-based generated and human-curated titles. For these languages, we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.

Patents