Pavan is a researcher in artificial intelligence with ten years of experience in Speech & Audio Processing / Speech Recognition R&D. He is working to develop speech recognition systems designed for conversational AI.
INDIAai interviewed Dr S. Pavankumar Dubagunta, a staff speech scientist at Uniphore, to understand his perspective on artificial intelligence.
It’s good to see that you’ve researched speech technology. Can you tell us about your research findings and the importance of speech technology?
Speech is the primary and most natural mode of communication among people. When a person speaks to another person or a computer, so many aspects to the listener. Such as the spoken message, identity of the speaker, the spoken dialect, traits such as how fluent and proficient the speaker is, their emotional state such as happy, sad, or angry, and whether the speaker is feeling sleepy or depressed, naming a few. Most existing speech technologies focus on the spoken message (speech recognition) and the speaker identity (voice biometrics). However, there has been a growing interest in the aspects mentioned earlier of speech communication. My PhD thesis mainly focused on assessing several factors using a unified framework by modelling raw speech signals using convolutional neural networks and incorporating prior knowledge into them. The thesis is available online. Besides this, I worked on various aspects of improving speech recognition systems before my PhD.
By 2026, the global vocal biometrics and speech recognition technology market to reach $20.9 billion, expanding at an 18.1 per cent compound yearly growth rate (CAGR). Will Indian research contribute much to this expansion?
Yes, several academic and industrial groups in India are researching speech technologies in local and global languages. We also are adequate in terms of talented students and professionals. We are also not behind in infrastructure to carry out research experiments and innovate. There is also a growing demand for speech technologies in several sectors in India due to scarcity of personnel, such as call centres, health care, primary education, language learning and home automation. All these factors are conducive to a better contribution from India to the market growth in the coming years.
Can you tell us about your current position and responsibilities?
I currently work with Uniphore Software Systems on improving automatic speech recognition systems designed for conversational AI. My contribution lies in innovating design choices and developing customized speech solutions for several markets across the globe. We also collaborate with academic partners to stay relevant with the latest advances in technology.
As an Indian graduate, what difficulties did you encounter while pursuing your PhD?
I did my PhD in Switzerland at Idiap Research Institute, a non-profit research organization, and enrolled at the Swiss Federal Institute of Technology in Lausanne (EPFL). Idiap resides in a small French-speaking town in the middle of the Alps, where most locals neither spoke English nor even understood it. But most of us at the institute were international students from across the world, and we formed groups among ourselves to have a social life. Getting to know several cultures was the most exciting aspect of it. The local students were very friendly and helped us understand the cultural differences, the dos and don’ts and the appropriate etiquette. The food at the restaurants tasted quite bland to most Indians and was also expensive, so we cooked our food on most days.
What are the specific areas in which Indian universities fall short when conducting research? What modifications do you recommend?
I feel a crucial aspect is the PhD stipend. In Switzerland, I could afford to raise a kid with a decent standard of living while pursuing my PhD – something challenging to pull off in India without compromises. India has a long way to go in this regard, but better stipends attract more talented people to take up PhD as a career choice, which pays off to the country in terms of improved research quality, more successful start-ups, and better-skilled talent.
Another aspect is that more professors need to be approachable to students at Indian universities. During my PhD, my professor genuinely listened to my concerns and worked an extra mile around obstacles to provide a conducive environment for me. I could freely approach him with any research ideas or technical questions without being ridiculed, no matter how simple or silly they were. I could utilize my leaves as per my choice, with no questions asked. I received support when things weren’t working, both technically and morally. These simple aspects, I feel, make a great deal in maintaining a positive headspace and creativity among research scholars at Indian universities.
Could you tell us about your research area and its exciting outcomes?
Speech recognition is a multi-disciplinary area of research. Moreover, speech is at a typical rate of 16,000 samples for each second, requiring fewer words. This area involves signal processing and deep learning. Linguistic knowledge is used in conjunction with algorithmic optimization and searches to form a set of possible word sequences ranked, out of which the best one. The variations in how people utter the exact words differently every time, how different speakers speak differently, the presence of background noise and reverberation, accent, and several other factors make the problem complex. We simplify the problem by defining domains, i.e., the context in which the person is speaking, and use it to reduce the search space and tune the model parameters. Such tailored speech recognition solutions always work better for specific use-cases than the generic ones that use no prior knowledge. This use-case is a common challenge in the industry that varies as per each customer’s requirements and keeps us excited.
How do you observe the growth of speech recognition from the time you were doing your PhD to the present?
The field has been overgrowing. The focus of speech research shifted from traditional hybrid speech recognition to end-to-end approaches. However, the support for quick domain adaptation, flexible language model tuning and low latency, as required by the industry standards, still lies better with the traditional systems. So, the industry adoption of end-to-end systems will happen over a more extended period with more innovation. In addition, the availability of data and computing resources has become more critical than before. Even in academia, datasets are now thousands of hours in size for speech recognition, and one that is in tens of hours is a toy task.
What are the differences you notice between conducting research at a university and in the industry?
The objective of a university is to learn the science behind how things work. There is ample room to explore ideas, tweak things and observe the changes. The datasets are well-designed, and the experiments are well-controlled. The main reward is knowledge, publications, and citations. On the contrary, the objective in the industry is to generate business value and revenue out of research ideas.
Moreover, the datasets have multiple real-world issues, and the experiments are conducted on larger scales and optimized for the best performance. The main reward is the growth of business and the product reaching several end-users and making a large-scale impact.
What advice would you provide to those who aspire to pursue a career in artificial intelligence? How should they equip themselves?
To sustain and excel in any field, one needs to nurture a genuine interest in the area and have an open mind to keep learning. Students go through a lot of forced-upon reading for competitive exams and to score marks, which I feel hinders creativity. Reading books and articles out of choice, questioning and reasoning out why each module works and how it works, and implementing ideas out of curiosity make the mind creative. Specifically for AI, one needs to understand math basics in algebra, probability and statistics before exploring deep learning and translating simple ideas into programming instructions to develop modular thinking. It helps to get hands-on by participating in online machine learning challenges and contributing to open-source projects. The field is changing rapidly, so following the work of notable people in the area through social media helps stay relevant.
Could you please tell us about some unique research articles and books on artificial intelligence that inspire you?
As I started working on speech processing, the works of Rabiner inspired me: the book on speech processing, its course material and the landmark tutorial on hidden Markov models. The paper on weighted finite-state transducers by Mohri et al. established several practical aspects of speech recognition. I benefitted largely from Downey’s easy-to-read books, especially on how to think like a computer scientist.
Bourlard and Morgan’s book on hybrid speech recognition,
Sequence training by maximum mutual information by Povey et al., raw speech modelling by Palaz et al., self-supervised speech representation learning by Baevski et al., attention mechanism by Vaswani et al. and Adam optimizer by Kingma and Ba are some of the works that inspired me.
Source: indiaai.gov.in