Machine learning allows computers to mimic human behaviour by training them with historical and predicted data. This section will examine some interesting machine learning algorithms such as state-action-reward-state-action, Lasso, and Self-play.
State–action–reward–state–action
The state-action-reward-state-action (SARSA) algorithm is a reinforcement learning tool for learning a Markov decision process policy. Rummery and Niranjan put the idea in a technical note under “Modified Connectionist Q-Learning” (MCQ-L). Rich Sutton’s alternate name, SARSA, was only mentioned in a footnote. As an on-policy learning algorithm, a SARSA agent interacts with the environment and updates the policy based on actions taken. For example, an error, adjusted by the learning rate alpha, updates the Q value for a state-action. The Q values represent the potential reward received in the next step for acting state s and the discounted future reward received from the following state-action observation.
Furthermore, due to the iterative nature of SARSA, an initial condition is implicitly assumed before the first update. The update rule causes any action always to have higher values than the other alternative, increasing their chance of making their choice. This process is known as having a low (infinite) initial value or “optimistic initial conditions” and can encourage exploration.
Researchers proposed in 2013 that the initial conditions using the first reward, or r. This theory states that the reward is to determine Q’s value. In the case of fixed deterministic rewards, this enables instant learning. In repeated binary choice experiments, the resetting-of-initial-conditions (RIC) strategy appears consistent with human behaviour.
Lasso
Less absolute shrinkage and selection operator (LASSO) is a regression analysis technique used in statistics and machine learning. It performs variable selection and regularization. In addition, it improves the predictability and understandability of the resulting statistical model. Robert Tibshirani, who coined the term, first used it in geophysics and later.
Lasso is for models of linear regression. Its connections to ridge regression, best subset selection, lasso coefficient estimates and soft thresholding are a few examples. Additionally, it demonstrates that if covariates are collinear, the coefficient estimates do not necessarily need to be unique, unlike in standard linear regression. Furthermore, the LASSO and basis pursuit denoising are closely related.
Self-play
Self-play is a method for enhancing reinforcement learning agents’ performance. Agents naturally learn to perform better by competing with themselves. Researchers attempt to maximize a learning agent’s performance on a task in multi-agent reinforcement learning experiments in collaboration or competition with one or more agents. Researchers may decide to have the learning algorithm take on the roles of two or more different agents as these agents learn through trial and error. When used effectively, this technique offers two benefits:
- It offers a simple method to ascertain what the other agents are doing, creating a substantial challenge.
- Since we can use the perspectives of the various agents for learning, it multiplies the amount of experience we can use to improve the policy by a factor of two or more.
Furthermore, the AlphaZero program uses self-play to enhance its abilities in the games of go, shogi, and chess. In addition, the epistemological idea of tabula rasa, which describes how people learn from a “blank slate,” has been compared to the act of self-play.
Source: indiaai.gov.in