Authors
Aida Kostikova, Kristin Migdisi, Sara Szoc and Tom Vanallemeersch
Abstract
A well-known challenge of machine translation (MT) is accurately translating domain-specific terminology. While various methods have been suggested to address this challenge, they all come with limitations and increase the user’s dependence on a specific MT engine. Recently, large language models (LLMs) for various natural language processing tasks, including automated translation, have gained significant attention, urging the need to investigate the potential of these models for terminology translation. Therefore, we compare ChatGPT, an LLM-based chatbot conversing with a user, to DeepL, an MT system converting sequences to sequences. We use both systems to perform translations with and without glossaries. We also combine both systems by post-editing MT output with the chatbot. Automated and manual evaluations indicate that the global translation quality of MT is better than or on par with that of the chatbot with a glossary, but that the latter system excels in terms of terminological accuracy when used for translation or for post-editing. While such post-editing avoids user dependence on a specific MT engine, it sometimes causes new translation issues, such as shifts in meaning, suggesting the need for future improvements. Our experiments focus on two language pairs, English-Russian and English-French, and on two domains (COVID-19 and legal documents).
This paper received the Best Paper Award at the TC45 conference.
