“The excitement, energy and passion for NLP come from linguistics,” remarks Dr Pushpak Bhattacharyya during his conversation with Samiksha Mehra of INDIAai. A veteran researcher in the field of language technology, he has made seminal contributions to the fields of Machine Learning (ML) and Natural Language Processing (NLP) in a career spanning over three decades.
Dr Bhattacharya is the Major Bhagat Singh Rekhi Chair Professor at the Indian Institute of Technology Bombay. A former director of IIT Patna, he is also the former president of the Association of Computational Linguistics (ACL). He is an academician of global repute and has also been instrumental in influencing AI policymaking in India through his participation in various governmental committees. Some of his trailblazing research projects include Automatic Sarcasm Detection, Multilingual Computation, Indian Language Neural Machine Translation, and IndoWordNet.
Here are excerpts from our conversation with him.
The inherent love for languages
“Anybody who grows up in India has to be keenly aware of the linguistic diversity in the country,” he says. Reminiscing about his childhood days, Prof Bhattacharya recalls how his love for linguistics was inbred. His grandfather was a reputed Sanskrit scholar and his father was in love with the English language and its literature. Moreover, he learnt Bengali as a native speaker and German as an undergraduate student at IIT Kharagpur.
“Linguistic data is abundantly available in the country. So, linguistic work in India is always very relevant and full of interesting challenges,” he says.
“My master’s thesis was maybe the first one in machine translation amongst Indian languages,” he says, adding how he blended his prowess in two different disciplines to carve out a career in NLP.
“I had always been thinking about how to make computers understand and generate language. What is language? What is the linguistic faculty? What cognitive processes underlie language faculty? How to replicate them in the form of an algorithm on a computer? – These are the key questions that have always remained with me and have driven me. And I found that one can combine linguistics, probability and coding to create very effective and useful systems in Natural Language Processing,” he says, relating his journey.
CFILT, his lab at IIT-B
The Centre for Indian Language Technology was set up with a grant from the Department of Science and Technology in 2000. This was preceded by a grant from the United Nations University in Tokyo for linking Hindi with a universal networking language project.
“I would say that the main contribution of CFILT to NLP has been the creation of Indian language WordNets and that of neural machine translation systems involving English, Hindi, Marathi, and many other languages. And we also created a prototype Indian language search engine, which was released by the ministry in 2012.”
Listing out more contributions of his lab, he adds, “CFILT organised one of the largest NLP conferences – Computational Linguistics Conference – in 2012 at IIT Bombay. This was the first time that one of the two major NLP conferences (ACL being the other one) fully came to India. This was a huge exposure for students of AI, ML and NLP in the country.”
India’s ambitious National Language Translation Mission (NLTM)
In recognition of the lingual multiplicity of India, the government has backed the development of automatic machine translation for Indian languages. One such ambitious project is the National Language Translation Mission (NTLM), also known as Bahubashak. Using speech-to-speech machine translation, the goal is to overcome the language barrier by enabling seamless communication across languages.
“One of the domains here is education. Professors in premier institutes deliver the lectures in English which is a barrier for students who are not very comfortable with English. The goal is to deliver these lectures in real-time in the native language of students. Our part at IIT Bombay in the Bahubashak project is integrating Marathi,” he says. This also aligns with the aim of the New Education Policy 2020 to make education possible in one’s mother language.
“People are most comfortable with interacting in their mother tongue. Content, too, is best communicated and absorbed in one’s own mother tongue. So, first and foremost, the emphasis should be on integrating all Indian languages in one common machine translation framework so that education, business, economic and entertainment activities do not face any language barrier,” he says.
He explains how India’s ecosystem is exceptionally congenial to AI research. “While scale throws up very interesting and challenging problems, the solution is also there mainly because there is so much data.” So there’s a combination of exciting problems and the resources to solve them, all co-existing in this diverse country.
From model-driven AI to data-driven AI
AI has often seen winters and summers, characterised globally by falling and rising enthusiasm for the field, respectively. “These frequent ups and downs in AI coincide with what is called a periodic return of rationalism and empiricism. In other words, we see this periodic transfer of emphasis from model-driven AI to data-driven AI, or rational AI to empiricist AI,” he says as he recalls the evolution of the field.
Elaborating on model-driven AI, he says, “Researchers, academicians, industry, application-oriented engineers, scientists – they understand the problem, break it up into its elements and create a model about how the problem should be solved.”
But there are shortcomings to this approach. “This is a person’s view of the world and the problem at hand, and it is limited by the understanding of the person or the researcher himself/herself. But there are always surprises, corner cases and challenges beyond what an individual has been able to see. That gives rise to solving the problem with data,” he explains.
Speaking specifically in the Indian context, he comments, “Since 1990, I would say that India’s AI has definitely moved more and more towards machine learning and data. India is really the data heaven where all the three V’s of velocity, variety and volume are satisfied.”
“When we were students, we were doing model-driven AI. These days, the AI has become very, very data and machine learning-drive, and India is not an exception,” he concludes.
Source: indiaai.gov.in