A New AI-Powered Hokkien Speech Translation System is Unveiled by Meta. The first AI-powered translation system for Hokkien, a language that is predominantly spoken, was developed by Meta as part of its Universal Speech Translator (UST) project.
Considering that machine translation methods were previously only available for written languages, the innovation is highly noteworthy. Up until recently, large amounts of written content were typically fed to AI models during training. However, it is now possible for academics to create a translation system for unwritten languages using a variety of sources, including written and spoken data.
Up till now, AI translation has mainly focused on written languages. However, among the more than 7,000 languages still in use today, about half are largely oral and do not have a common or widely used writing system. As a result, it is challenging to develop machine translation systems using conventional methods that need vast amounts of text to train an AI model.
The Chinese diaspora speaks a lot of Hokkien, so Meta has created the first AI-powered translation system for this language. Hokkien speakers can communicate with English speakers because to Meta’s technology because Hokkien doesn’t have a consistent written form.
In order to provide real-time speech-to-speech translation across all known languages, including those that are largely spoken, the Meta project Universal Speech Translator (UST) is creating new AI approaches. One element of UST is the open-sourced AI translation system. No matter where people are, including in the metaverse, verbal communication can help reduce barriers and bring them together, according to the business.
Traditional machine translation systems provide a number of challenging challenges, including model construction, data collection, and evaluation, which Meta’s AI researchers had to solve in order to build the new system. We need to do a lot of work to translate UST into more languages, the blog post states. But being able to communicate effectively in any language has long been a goal, and we’re glad to be getting closer to achieving it. In order for others to duplicate and enhance our work, we are also making our Hokkien translation models, assessment datasets, and research publications publicly available.
We can also use the ideas in a wide variety of different written and spoken languages. Additionally, Meta offers SpeechMatrix, a sizable corpus of speech-to-speech translations extracted using the data mining LASER method. After that, scientists will be able to base their speech-to-speech translation (S2ST) systems on Meta’s work.