This is part three of our series on Neural Machine Translation (NMT) and its impact on Language Service Providers (LSPs) serving the life sciences industry. In ‘Everything You Need to Know about NMT’, we provided an overview of the NMT model, its development, as well as the most common misconceptions surrounding its usage. In ‘How to Make the Most of NMT’, we looked in greater detail at the model’s functionality and the applications to which it is best suited.
In this latest blog, we speak to Sonia Ribeiro Hill, Technology Director, about the potential NMT holds for LSPs and their customers, including the challenges still to overcome.
What Difference will NMT Make to DWL’s Life Sciences Customers?
NMT has the potential to have a significant impact on translations in the clinical research and pharmacovigilance fields, as well as in regulatory affairs, says Ribeiro Hill, in particular when it comes to increasing speed and efficiency: “For instance, NMT tackles, in a very efficient manner, any type of document that comprises a high volume of repetition, such as lab reports, manuals and ethics committee and regulatory authority letters. It can also translate rapidly large volumes of adverse event data that would previously have been too large to handle quickly.”
In addition, NMT would be extremely useful for any documents that require urgent translation: “Deadlines during clinical studies can be extremely tight, sometimes requiring as many as 25,000 words to be translated within a few days. NMT will help us to meet those deadlines while maintaining our usual level of quality.”
How Can You Be Sure to Maintain Quality?
The quality of any system’s output is directly linked to the quality of the data that is put in, so it is no surprise that one of the biggest challenges for an LSP training an NMT engine is the need to ensure that it is trained using ‘clean’ data. In other words, highly accurate source and translation segments.
As Ribeiro Hill explains, “it is important that engines are not built with ‘noisy’ – or inaccurate – data that could negatively impact the translation output. While it is possible to remove noisy data from neural machine engines, it requires careful and consistent data maintenance.”
Clearly, it is far better to verify the quality of data before you enter it, but how easy is this to do in practice? It depends on the LSP and its quality assurance (QA) practices, says Ribeiro Hill. “For example, to eliminate noisy data, DWL runs extensive QA on all its translations and translation memories to identify and correct inconsistencies or segmentation errors.”
In addition, when uploading translations to its cloud-based machine translation platform, KantanMT, DWL first passes all data through an automated script. “It has been designed to remove all unnecessary information, such as email addresses, tags, numbers, proper names and so on, as well as reporting any additional errors.” This is particularly relevant in the life sciences space, where patients’ medical records, commercial information, and confidential intellectual property are present in documents sent for translation. “The key is to keep your data up to date and consistently clean. This is time-consuming but essential work.”
What Quality Control Measures Do You Use?
Many different quality management measures may be applied to the machine translation process to maintain high-quality translations:
- Pre-Machine Translation (MT): Initial document analysis removes elements that can negatively impact the MT engine, such as unclear sentences.
- MT: memoQ uses translation memories built from high-quality training data.
- MemoQ QA: Computer-assisted translation (CAT) tools locate and correct spelling mistakes, tag errors, etc.
- Post-editing: Translations are polished by a professional translator/post-editor.
- Visual checks: Another layer of QA is carried out by a DWL linguist or project manager.
- Finalising document: The DWL team ensures the translation has met the brief originally requested by the customer.
- Feedback: MT engines are tailored to the preferences of each customer.
- Review and approval: Human translators always have the final say on any changes to the MT output.
- Continuous improvements: Translation memories are updated according to feedback and corrections to promote quality in future translations.
- QA on Translation Memories: We undertake extremely thorough data maintenance checks, segment by segment, the results of which are then fed back into the MT engine.
Such checks are designed to ensure quality control not only in the final translation but also on the information being fed to the engine, as Ribeiro Hill explains: “All these checks also allow us to prevent future errors, as the engines are trained with accurate data. As a result, the translation will be constantly improving.”
What Role Will Human Translators Play in NMT?
“No matter what, DWL-approved linguists are involved in producing the final output,” emphasizes Ribeiro Hill. “All our linguists, including those working on NMT output, are approved according to the ISO-9001 audited quality management system and many have been working with DWL for more than 15 years.”
NMT is a tool in the translator’s tool belt, rather than a replacement, she adds: “The translator is the judge and their verdict will be the final translation.” In other words, the translator’s choice will always outweigh the engines. “Even though NMT is intended to understand the context of the document, there will be occasions where this is overlooked, hence the engine output might not offer the best translation. It is critical, therefore, that a human translator checks the work of the machine.”
This might include, for example, the NMT selecting a technical word that is not appropriate for the intended lay reader of a translated text, such as ‘cephalea’ for instance. “The translator, being aware of the context and target audience of the document, would correct that to ‘headache’,” says Ribeiro Hill.
Are All NMT Engines the Same?
As we covered in our previous blog (‘How to Make the Most of NMT’), the quality and extent of training data are key to the success of an NMT engine. How easy is it to find good quality training data?
“It can be difficult for specialist fields,” agrees Ribeiro Hill. “While large banks of clean data are available, that data tends to be very generic and so unlikely to be useful for a specialist life sciences NMT engine. Fortunately, DWL is an LSP that specializes in the life sciences sector, which means we have access to large volumes of anonymized, high-quality training data that we can use to clean and deploy our NMT engine.”
In practical terms, this means the difference between an NMT engine that can accurately translate day-to-day information – for instance, the most typical structure of a letter – and one that can correctly translate words and compounds that have different meanings depending on the relevant field. Ribeiro Hill gives us an example of terms such as ‘patty’. In day-to-day language, this refers to the meat portion of a hamburger. However, in a surgical setting, it is an absorbent sponge used during invasive procedures. Unless trained with contextual data the engine will mistranslate such a term.
“Training NMT engines on very generic, or non-specialist, data might bring ambiguity to the output of the engines. By using only highly specialized corpora, we ensure that the correct translation for each word or compound is used, avoid ambiguous translations, and, at the same time, ensure we are consistent with our translations.”
What Does the Future Hold?
NMT is a fast-evolving and exciting field, and so the process of training and expanding the use of NMT is constantly improving. “The more evaluation and training the NMT engine undergoes, the better it will be at translating specialized life sciences texts,” says Ribeiro Hill, “and the more benefits it will provide our customers, both in terms of quality and speed.”
This is important because significantly expediting the translation process might otherwise result in higher costs – or lower quality output. “Thanks to this technology, our customers can benefit from fast turnarounds with the same high levels of translation quality.”
Please contact us for further information.