Facebook AI, the California giant’s artificial intelligence research lab, has unveiled its first multilingual model. A powerful tool that will allow you to quickly translate any content into a hundred languages. And without altering the meaning!
Every day, Facebook makes nearly 20 billion translations on its news feed to provide information in all languages on Covid-19, relay reliable information and avoid “harmful” content. A feat made possible thanks to the efforts made by the American giant in research into machine translation with low resources and to the progress in the quality of translations.
For several years, Facebook has opened its research laboratory on artificial intelligence called Facebook AI and automated translation, reliable and fast, has become one of the most active areas of activity. It must be said that, on a daily basis, the two billion users of the social network publish content in 160 different languages. So Facebook needed to equip itself with a tool capable of multiplying translations into as many languages as possible.
Capable of translating 100 languages into 100 languages
Here is M2M-100, an AI created by the research laboratory after several years of fundamental research in machine translation. It’s the ” first multilingual machine translation model » (MMT –Multilingual Machine Translation in original version). Its strength: it can translate 100 languages… into 100 languages.
This means that Facebook’s AI is able to translate from Chinese to French without having to go through English as an intermediary as is often the case with multilingual models. To find how they work, these AIs tend to learn to translate from Chinese to English first, then from English to French to work out their final result. To preserve the meaning of the initial formulation, the Facebook AI model trains directly from Chinese to French.
« Typical TM systems require the creation of separate AI models for each language and each task, but this approach does not scale effectively on Facebook“, Explains one at Facebook AI. “Advanced multilingual systems can process multiple languages at once, but compromise accuracy by relying on English data to bridge the gap between source and target languages. We need a CMM model that can translate any language in any direction».
7.5 billion sentences studied
Thus, the M2M-100 was trained in a total of 2,200 linguistic departments, ie 10 times more than the best multilingual models centered on English. In total, 7.5 billion sentences in 100 languages helped build the AI dataset to make sure the initial meaning of the sentence does not suffer from cultural alteration by switching to English before moving on. land in a third language. Facebook researchers have combined complementary data mining resources (in particular ccAligned , ccMatrix and LASER) et open source.
And the model thus developed is made available to the greatest number open source.
Developing the ultimate translation tool for everyone
But if the M2M-100 is a technical feat of the work of the AI, the interest is above all that the greatest number of people benefit from it. For this, it will be deployed on Facebook to be used by the social network and thus allow even more interactions between users from all countries. Above all, this will allow everyone to instantly read (automatically if the option is enabled in your account settings to switch all messages to your language) any publication posted in a language that is not yours. No need to go for a walk on Google Translate, Facebook will do it for you, right in front of your eyes.
Above all, this responds to the goal displayed by AI research for years to create a single universal model, capable of understanding all languages, all dialects in different types of tasks and activities. “This work brings us closer“, Welcomes Facebook, adding that this will also allow”serve more people, keep translations up to date and create new experiences for billions of people. »
If you want to know all the details of the development of the M2M-100 project, its different stages, Facebook AI has split a very developed blog post on the subject.