Madhumita Dash has done her PhD in Horticulture and has vast experience in AI in academics and industry.
Currently working as Principal AI Research Engineer at Fasal, Bengaluru, India.
INDIAai interviewed Madhumita to get her perspective on AI.
What sparked an AI interest in a biotechnology graduate? How did it all begin?
It started with my interest in genomics and genomic data science. A single human genome stores ~700 megabytes of information, and genomic data scientists must process at least 30x times this information (~100 GB) to derive valuable insights. For plants, the genome size is ~50x times bigger than a human genome. And my interest was to identify the hidden secrets in plant genomes, especially tree crops. Following my interest, I pursued my PhD in both horticulture and computational biology, where I worked towards identifying the science behind different fruit sizes in apples. Like most genomic data scientists, I started with advanced statistics to analyze the massive amount of data. Still, I soon realized that several machine learning algorithms could be better suited to solve this problem. And that was the beginning of my journey into AI.
Can you tell me about your PhD research problems and the solutions you found?
For my PhD, I worked towards identifying genetic and other physiological parameters that regulate fruit size in apples. I had some side projects, but my main focus was to understand the genome of apples and identify potential genes that regulate or control fruit size in apples. My research’s main objective was to help future apple breeding programs. The apple genome wasn’t sequenced when I started (in 2008). So the genome data available was an Expressed sequence tags (ESTs) database, which was short fragments of mRNA. For any gene to express, it is first converted to mRNA. So to identify potential genes regulating fruit size, I focused on mRNA sequences in the ESTs database that are present in higher/lower quantities in bigger-sized apple varieties or during the apple fruit development stage. I ended up with ~20% of the sequences in the DB, which was still a vast amount. As a next step, I developed a specific naive Bayes algorithm to shortlist other series more likely to be associated with fruit size regulation in apples. Finally, I got a list of 10 genes from the above analysis and of which two genes showed a very high potential of being associated with fruit size regulation in apples. The same two genes have now been shown to regulate organ size in several other plant species. I would consider that a success because I began with few resources and could still solve the problem I set out to solve.
What early challenges did you face as a bioscience graduate in AI research?
The major challenge was learning many new things like advanced statistics, machine learning, and programming, along with my regular coursework related to genomics and horticulture. My research work involved both computational analysis and fieldwork to collect data. I had to learn many things quickly to focus on the actual work and research, which was very time-consuming. Also, limited online resources were available then, so I had to enrol on many basic on-campus classes. I could only attend a limited number of such classes in a semester since I had my research work and was also a teaching assistant. Therefore, I learnt many things through textbooks and journal articles. I hardly had any social life during the first few semesters, but I improved my self-learning skills.
What are the most notable differences between your research abroad and in India? In what areas are we falling behind?
I would highlight two significant differences between academic research in India and the US.
1) Interdisciplinary research topics in India are still very limited. Indian academic research is much more oriented towards fundamental research and lacks collaboration among different labs. The kind of interdisciplinary collaborative research work I did during my PhD and postdoctoral was tough to establish the same in India. If there were any collaborations, it was mostly with researchers outside India. I could hardly find much collaboration between labs in India.
2) Journal publications and impact factors are more critical in India than in the US. When I was in the US, dissemination of research work was essential and more importance was given to the quality of research one did rather than the impact factor of the journal in which it was published. Whereas in India, for many academic hiring and promotions, >75% credit is from the publications, which is the total of your impact factors or some rudimentary analysis. Most labs and graduate students have this immense pressure of publication that takes away innovation and creativity.
What do you do at Fasal as the Principal AI Research Engineer?
Simply put, we are trying to solve the problems of farmers and growers. Fasal is an AI-powered, IoT-based full-stack platform for horticultural farmers. We help farmers make data-driven decisions regarding crop production, selling their produce and other farm-related activities. Currently, my team at Fasal are focusing on microclimate forecast, crop modelling and remote sensing. We have a lot of further research in the pipeline, and our main objective is to leverage data and technology to make agriculture reliable, profitable and environmentally sustainable.
What role does artificial intelligence play in commercial horticulture?
I believe AI will be the driving factor for the next green revolution. The 1960s green revolution introduced hybrid varieties, chemical fertilizers and pesticides to Indian agriculture. Now the usage of these chemicals has become one of the biggest problems globally. Indian agriculture is a significant contributor to greenhouse gas (GHG) emissions. To meet our Mission for Sustainable Agriculture (NMSA) agendas, we must focus on bringing data-driven decision-making to our farmers. With the help of correct data and AI, we can fulfil this mission towards net-zero emissions by 2070. In the last four years, Fasal has saved 9 billion litres of water in irrigation and achieved up to a 60% deduction in pesticide cost and a 40% increase in yield. AI is also being applied in crop financing, supply chain management, automating farm activities etc.
It is excellent that you have created weather prediction models. However, are limited data a hindrance to accurate weather forecasting? What is your perspective?
Weather forecasting is a 3 step process that includes observation, analysis and communication. We need to work on all three processes. The observation step involves working with atmospheric models to depict the state of the atmosphere and then combining this with information drawn from observational data collected using automated weather stations (AWS). And yes, we are limited here since our country’s data observation network (both manual and automated stations) is still poor. We recently collaborated with IMD to combine our observational data with improving rainfall forecasting. However, we still have a long way to go.
Regarding analysis and communication, most forecasts today have a much coarser resolution. You would see that the forecast is provided at the city or block levels. However, the climate has become more dynamic, and we observe variations in weather patterns even in a 1-km range. Therefore, more focus should be on hyperlocal or microclimate forecasting, our focus at Fasal.
What advice do you have for those who wish to pursue careers in AI research? What are the most effective ways to advance?
I believe the most important thing would be to be clear on why one wants to pursue a career in AI. There is hype around the area, and many freshers believe all AI is about is developing ML/DL models. Whereas the truth is model development is maybe 20% of the work, most of the effort goes towards data analysis, system design, productizing the model, communication and monitoring. And domain understanding is equally crucial if you want to grow in your career as an AI researcher or engineer. Therefore, my advice will be to start with the basics once you are sure you want to pursue a career in AI. Focus more on data analysis, core optimization concepts in different algorithms, programming and interpreting model results correctly. Work on some real projects outside kaggle via hackathons, internships, capstone projects, etc. and develop your experience in the end-to-end AI lifecycle.
Could you provide a list of notable academic books and journals in AI?
I will list some resources for beginner and advanced learning:
Beginner: I highly recommend Andrew Ng’s online lectures on Machine Learning and Deep Learning Specialization for beginners. A few books that I would recommend are:
1) David Spiegelhalter’s The Art of Statistics: Learning from Data
2) Artificial Intelligence – A Modern Approach by Stuart Russell & Peter Norvig
3) Andriy Bukov’s books titled “The Hundred-Page Machine Learning Book” and “Machine Learning Engineering.”
4) Aurelien Geron’s Hands on Machine Learning with Scikit Learn, Keras and TensorFlow 2
Advanced: For advanced learning, I believe one needs to focus on AI, from programming to system design to effective communication. Below are a few books I would recommend:
- Martin Kleppmann’s “Designing Data-Intensive Applications”
- Deep Learning by Ian Goodfellow
- Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
- Building Successful Business Models based on Artificial Intelligence
Besides the above books, the major journals I would recommend would be NeurIPs proceedings, Journal of Artificial Intelligence (AIJ) and Expert Systems With Applications.
I would also recommend several blogs like Machine Learning Mastery and KDnuggets (great for beginners), MIT News, Open AI, AWS Machine Learning Blog, Google AI, and Towards AI, among others.
Source: indiaai.gov.in