Kaggle has published the result of the Kaggle Machine Learning and Data Science Survey. The survey collected over 23997 responses from industry experts. According to the research, Indian nationals’ tops Kaggle’s followers list. It added that the United States of America stands behind India in the chart.
The survey estimates that India and Japan had good annual growth in the number of Data Scientists over the last few years. However, the gender ratio is still skewed, which stands at 76 to 22. 60% of the Data Scientists are aged less than 80, and 80% are aged less than 40.
Learning Data Science and Programming
Online platforms are known to be the most preferred place to learn data science, followed by video platforms like YouTube. Based on the Kaggle respondents, Python is the most accepted language for Machine Learning. As per the numbers, it has 88%, preferred users. SQL follows Python in the programming board. The respondents chose Jupiter notebook for Integrated Development Environment (IDE). VSCode follows Jupiter notebook according to the graph.
Concerning the Machine Learning frameworks, Matplotlib and Seaborn lead the pack for visualizations. Scikit-learn is the go-to ML framework, followed by TensorFlow. Linear or logistic is still the ML algorithm in demand.
While following the year-by-year progress, Pytorch witnessed a vital escalation in the graph. About Machine Learning algorithms, numbers favor linear or logistic algorithms this year as well. The fact that transformers hold firm ground in Natural Language Processing is a significant takeaway of the survey results.
Industry Outlook
The survey stated that 20% of the respondents mentioned having Machine Learning in production for more than two years. The most common data responsibility is understanding the data to make business and product decisions. MLFlow is the commonly used model serving tool.
The survey confirmed that SageMaker is the well-liked Machine Learning service. It is followed by DataBricks according to the analysis. However, it is essential to note that TensorBoard is most commonly used for experiment tracking, followed by MLFlow.
Source: indiaai.gov.in