Machine learning trains computers to behave like humans by feeding them historical data and predictions about what might happen in the future. In this section, we’ll look at exciting machine learning algorithms like iDistance, Kernel methods for vector output, and Local outlier factor.
iDistance
In pattern recognition, the iDistance is a method for k-nearest neighbor queries on point data in multi-dimensional metric spaces. The kNN query is one of the most complex problems to solve with data with more than one dimension, especially when the number of sizes is high. The iDistance is made to handle kNN queries in high-dimensional spaces efficiently, and it works best with skewed data distributions, which are common in real-world data sets. We can add machine learning models to iDistance to learn how data is spread out so that it can search and store multi-dimensional data.
The iDistance method can be seen as making the sequential scan go faster. Instead of scanning records, the iDistance starts the scan from places where it is very likely that the closest neighbors can be found quickly.
The iDistance has been used in several ways, such as
- Image retrieval
- Video indexing
- Similarity search in P2P systems
- Mobile computing
- Recommender system
Furthermore, In 2001, Cui Yu, Beng Chin Ooi, Kian-Lee Tan, and H. V. Jagadish were the first to come up with the idea of the iDistance. Later, they improved the method with the help of Rui Zhang and did a more in-depth study of it in 2005.
Kernel methods for vector output
Kernel methods have been used for a long time to figure out how the data that goes into a function affects what comes out of it. Kernels wrap up the properties of functions in a way that is efficient for computers and makes it easy for algorithms to switch between tasks with different levels of complexity.
In most machine learning algorithms, the output of these functions is a scalar. At least in part, the recent development of kernel methods for processes with vector-valued outcomes is due to the desire to solve related problems simultaneously. Kernels showing how the issues are related make it possible for them to help each other. Multi-task learning (also called multi-output learning or vector-valued learning), transfer learning, and co-kriging are all examples of this type of algorithm. Multi-label classification can be thought of as mapping inputs to (binary) coding vectors whose length is equal to the number of classes.
Furthermore, Covariance functions are used to talk about kernels in Gaussian processes. Multiple-output parts are the same as thinking about more than one process. We can find the connection between these two points of view in the Bayesian interpretation of regularization.
Local outlier factor
In anomaly detection, the local outlier factor (LOF) is an algorithm that Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jorg Sander came up within 2000. It measures the local deviation of a given data point compared to its neighbors to find data points that are out of the ordinary.
Some ideas in LOF are the same as those in DBSCAN and OPTICS, like “core distance” and “reachability distance,” which are used to estimate the density of a given area. The local outlier factor is based on the idea of a local density. By comparing an object’s local density to its neighbors’ local densities, one can find areas with the same density and points with a much lower density than their neighbors. These are thought to be unusual.
The average distance a point can be “reached” from its neighbors determines the local density. The LOF definition of “reachability distance” is a different way to ensure that cluster results are more stable. Unfortunately, the “reachability distance” that LOF uses has a few minor details often wrong in secondary sources, like Ethem Alpaydin’s textbook.
Source: indiaai.gov.in