AI4Bharat is a non-profit, open-source community of engineers, domain experts, policy makers and academicians, all collaborating to build AI solutions for solving India’s critical socio-economic and environmental challenges. This initiative is helmed by IIT Madras faculty members Mitesh Khapra and Pratyush Kumar, as a part of their technology startup One Fourth Labs.
Language technology holds special relevance in a culturally diverse country like India that has a variety of languages and dialects spoken in its various corners. Through allied abilities such as natural language processing (NLP) and speech recognition, it gives machines the ability to comprehend and respond to human texts and speech. Innovations in the field have the potential of impacting the lives of billions who still interact primarily in their vernacular language.
The founders began by creating large corpora of Indian language texts. “So as at the base level of doing all this huge stack of NLP, you simply need to have text in Hindi, Malayalam, Bengali, etc.,” says Pratyush. As a result of their efforts, they were able to release the largest corpus of Indian language texts and increase the size of corpus available by an order of magnitude in many languages.
While the proliferation of mobile devices is intense even in the interiors of the country, the availability of digital information in regional languages in extremely limited, if not entirely negligible. The founders took it upon themselves to move the needle. There are two main ways to change this, according to them. First, aggregate all the available texts in the regional languages. Second, kickstart AI models that help in data creation. And that’s exactly what they are doing: creatively collecting data from different places, and then, building and releasing translation models in Indian languages.
Further on, they’re moving to the transliteration problem, which involves typing in a regional languages using an English keyboard. This manoeuvre is achieved by swapping text in predictive ways.
“The goal here is that we want to bring parity in AI technology for Indian languages with English,” Mitesh explains, adding how, owing to the availability of open source models and enough data, we take for granted so many NLP tasks for English, such as sentiment analysis, named entity recognition, profanity detection etc. And although they’ve just made a start, the goals are ambitious. “We want to build these solutions for this long tail of NLP tasks for as many Indian languages as possible,” says Mitesh.
The progress is gradual but considerable. AI4Bharat was part of Google India’s first ‘AI For Social Good’ cohort, where the team worked with the NGO Pratham Books. Recently, they’ve worked with the Nandan Nilekani-backed ‘EkStep Foundation’. Further, they’ve launched a year-long AI residency program to mentor students who have completed their B-Tech. They’re also in talks with the government with regard to the National Language Translation Mission which was announced in this year’s budget.
The team at AI4Bharat wants to keep all their models open source with permissible licenses. “We would like to support startups, governments, or companies using these models,” says Pratyush.
Source: indiaai.gov.in