What This Means for the Future of LLMs and Machine Translation
As we look to a future where LLMs become increasingly sophisticated, it seems that for now, tuned NMT models produce more consistent results that are easier to post-edit to high quality, particularly with real-world content processed in classic translation management system workflows.
It’s also important to note that highly trained NMT models (using specific domain content and terminology) aren’t subject to some of the technical challenges and quirks that persist when using generative AI.
NMT offers higher predictability, especially over time, and across tuned languages. We also previously compared generic NMT output to LLMs output, and while the quality is lower (increased post-edit distances, etc.), the predictability in the output is consistent. With LLMs, quality tapers off quickly and, notably so with non-English languages as a source, and for less well-resourced languages in general. Content output can vary quite materially over time.
One example relates to AI hallucinations – especially in lower-resourced languages – which can affect output to the point where the translation is simply not useful. This was seen in incorrect handling of technical content such as URLs, client or domain-specific terminology, and short sentences, meaning LLMs don’t yet produce such reliable results when processing content in large batches, or at scale.
Generally, Acolad results combined with expert human review showed that while the LLM output did score relatively highly, it struggled with more complex content with structural elements, like formatting, and inline tagging.
Further, with the requirement to manage relatively complex prompts across languages and models, the broader application of LLM technology in translation workflows will add to the total cost of translation, even though raw processing costs are dropping.
Effectively, if you require automated translations for large amounts of content without human input or post-editing, it’s likely better to rely on a quality, proven machine translation solution - for now.
As we already noted, even when employing a human-in-the-loop to edit the automatic translation output, it can still be more cost-effective to use machine translation over generative AI, simply because of the time saved from iterating over prompts to refine the output of the LLM. Additionally, NMT has a lower Post-Edit Distance (PED) and Translation Edit Rate (TER), meaning it requires less work to correct compared to starting with LLM.
Despite these results, it’s clear that Generative AI LLMs will still have a large role to play in automating translations - especially as the models are refined. It has exciting potential applications in its use to stylistically rewrite MT output, for example. There is strong evidence that LLMs could play a pivotal role in quality evaluation, and this may support capabilities in translation, such as self-reflecting post-editing.
They clearly display exciting possibilities when dealing with ambiguities, idioms, cultural references, and even humor that some MT models have traditionally struggled with, given the contained data sets used to build their models.