Mathematicians are sucker for a challenging problem. Even something as seemingly intangible as multiplication matrices, which are just two-dimensional tables of numbers, can have the sense of a game when you are looking for the most effective method to complete the task. It’s tough, but there’s something irresistible about it, like trying to solve a Rubik’s Cube in the fewest number of moves possible. On the other hand, the number of possible moves on a Rubik’s Cube is always 18, whereas for matrix multiplication, even in relatively straightforward situations, each step can present more than 1012 different possibilities.
Over the course of the previous half century, scholars have taken a variety of approaches to solving this challenge, all of which have been based on computer searches assisted by human intuition. A team from the artificial intelligence company DeepMind demonstrated how to approach the issue from a new angle last month. They reported in a paper published in Nature that they had successfully trained a neural network to discover new fast algorithms for matrix multiplication. The paper also stated that the team was able to solve the problem. It appeared as though the AI had uncovered a hitherto undiscovered method for resolving a very difficult Rubik’s Cube.
Josh Alman, a computer scientist at Columbia University, referred to the outcome as “a very neat conclusion,” and he agreed. However, he and other experts on matrix multiplication underlined that the use of AI will supplement rather than replace previously established approaches, at least in the near future. Alman described it as a “proof of concept” for something that had the potential to be a breakthrough in the field. The outcome will just be of assistance to the researchers in their endeavour.
Three days following the publication of the report in Nature, a pair of researchers from Austria highlighted how new and old methodologies might complement each other. It was almost as if they were trying to prove a point. In order to further refine one of the methods that the neural network had developed, they performed a traditional computer-aided search.
The findings indicate that, much like the process of deciphering a Rubik’s Cube, the journey towards developing more effective algorithms would be fraught with obstacles.
The Process of Multiplying Matrices
Within the realm of mathematics, matrix multiplication is widely regarded as one of the most fundamental and important operations. The product of multiplying two n-by-n matrices, each of which has n2 elements, is a third n-by-n matrix. This product is generated by multiplying and adding the components together in a specific order to produce the product. The typical recipe for multiplying two n-by-n matrices requires n3 operations of multiplication, so for example, a 2-by-2 matrix requires eight multiplications to complete.
This procedure soon becomes laborious when applied to more extensive matrices that contain thousands of rows and columns. However, in 1969, the mathematician Volker Strassen devised a method for multiplying a pair of 2-by-2 matrices that required seven rather than eight stages of multiplication, although it did include more steps of addition than the previous method.
If all you want to do is multiply two matrices that are each 2 by 2, using Strassen’s algorithm is an unnecessarily complicated method. But then he had the realisation that it would also work for more extensive matrices. This is due to the fact that the constituent parts of a matrix can also take the form of matrices. A matrix that has 20,000 rows and 20,000 columns, for instance, can be rethought of as a 2-by-2 matrix, each of whose four elements being a separate matrix that is 10,000-by-10,000 in size. Following that, each of these matrices can be partitioned into four 5,000-by-5,000 blocks, and so on and so on. At each level of this hierarchical structure, Strassen could use his method to multiply matrices that were two by two in size. The amount of money saved by performing fewer multiplications grows in proportion to the size of the matrix.
Because of Strassen’s discovery, researchers began looking for more effective methods to do matrix multiplication, which eventually gave rise to two separate subfields. One focuses on a question of principle: If you consider multiplying two n-by-n matrices and then letting the value of n go toward infinity, how does the number of multiplication steps scale with n in the algorithm that performs the operation as quickly as possible? Alman and Virginia Vassilevska Williams, a computer scientist at the Massachusetts Institute of Technology, currently hold the record for the best scaling with a value of n2.3728596. (A recent unpublished preprint indicated that utilising a new method resulted in a negligible improvement.) However, these algorithms are only of interest from a theoretical perspective because they are only superior to approaches such as Strassen’s when applied to outrageously enormous matrices.
The second subfield focuses on thinking on a more specific level. Soon after Strassen’s discovery, the Israeli-American computer scientist Shmuel Winograd demonstrated that Strassen had reached a theoretical limit. According to Winograd’s findings, it is not possible to multiply 2-by-2 matrices using fewer than seven iterations of the multiplication operation. However, the answer to the question of the minimal number of needed multiplications for matrices of all other sizes is still unknown. In addition, quick techniques for tiny matrices have the potential to have a significant influence due to the fact that repeated repetitions of such an algorithm would be able to outperform Strassen’s approach when multiplying matrices of a reasonable size.
Unfortuitously, the total number of conceivable outcomes is extremely high. Even for 3-by-3 matrices, “the number of viable algorithms exceeds the number of atoms in the universe,” according to Alhussein Fawzi, a researcher at DeepMind and one of the leads of the new work. Alhussein Fawzi is also one of the authors of the study.
In the face of such a bewildering selection of possibilities, the researchers have made headway by recasting the multiplication of matrices as what appears to be an entirely other mathematical problem, one that is simpler for computers to solve. Multiplying two matrices is an abstract activity, but it is feasible to express this task as a particular form of mathematical object known as a tensor, which is an array of integers that is three-dimensional. After that, researchers are able to decompose this tensor into a sum of elementary components, which are referred to as “rank-1” tensors; each of these will represent a distinct step in the related matrix multiplication process. Because of this, developing an effective method for multiplication boils down to reducing the number of terms in a tensor decomposition to its absolute minimum. The fewer the terms, the fewer the steps required to solve the problem.
Researchers have developed novel algorithms in this method, which allow them to multiply n-by-n matrices with less steps than the conventional n3 multiplication steps for many small matrix sizes. But algorithms that perform better than not only the standard but also Strassen’s solution for small matrices have remained out of reach — up until today.
Prepare to Play!
The method that the DeepMind team used to attack the issue was to transform tensor decomposition into a game for a single player. They began with a deep learning system that was a descendant of AlphaGo, which was another artificial intelligence developed by DeepMind that, in 2016, learnt to play the board game Go well enough to beat the best human players.
All of the deep learning methods are based on neural networks, which are essentially webs of artificial neurons organised in layers and connected to one another by connections whose strength can vary to represent the amount that each neuron influences others in the layer above it. The power of these connections is fine-tuned over the course of many iterations of a training procedure, during which the neural network learns to transform each input it receives into an output that assists the algorithm in accomplishing its overall goal. These iterations take place while the neural network is being trained.
In the new algorithm developed by DeepMind and given the name AlphaTensor, the inputs reflect the steps that must be taken in order to produce a valid matrix multiplication scheme. The initial matrix multiplication tensor is the first input to the neural network, and the rank-1 tensor that AlphaTensor has selected for its first move is the output of the neural network. An updated tensor is produced by the algorithm once the rank-1 tensor is subtracted from the initial input. This updated tensor is then used as a fresh input for the network. The procedure is repeated until all of the elements in the initial tensor have been brought down to zero, at which point there are no more rank-1 tensors left to remove from the system.
At that point, the neural network has found a valid tensor decomposition because it is mathematically guaranteed that the sum of all the rank-1 tensors is exactly equal to the starting tensor. In other words, the neural network has broken down the tensor in a way that preserves its original structure. And the steps that were taken to get there can be converted back into steps of the algorithm that corresponds to matrix multiplication.
This is how it works: A tensor is regularly broken down into a set of rank-1 components by the AlphaTensor algorithm. Every time, AlphaTensor is awarded if it discovers a way to cut down on the number of steps that need to be completed. But the fast cuts to triumph are not at all obvious, much as you may need to jumble up a face that was precisely ordered on a Rubik’s Cube before you can solve the entire puzzle.
Now that they had an algorithm, the team had something that could, in theory, help them solve their challenge. They had to first teach it some manners though.
New Ways Forward
AlphaTensor, like all other neural networks, needs a large amount of data to train on, yet tensor decomposition is a notoriously challenging problem to solve. Few instances of effective decompositions were available for the researchers to feed into the network. Instead, they assisted the algorithm in getting started by training it on the considerably simpler inverse task, which consisted of adding a number of randomly generated rank-1 tensors.
According to Michael Littman, a computer scientist at Brown University, “they’re utilising the easy problem to produce more data for the hard problem.” [Citation needed] This backward training approach, when combined with reinforcement learning—wherein AlphaTensor generated its own training data while it blundered around looking for efficient decompositions—performed far better than either training method on its own than either training method worked on its own.
AlphaTensor was taught by the DeepMind team to decompose tensors that represented the multiplication of matrices that were no larger than 12 by 12. In addition to looking for efficient algorithms for multiplying matrices containing regular real numbers, it also looked for techniques that were unique to a more restricted environment known as modulo 2 arithmetic. (Since this is math based on only two numbers, the only possible values for the matrix entries are 0 or 1, and 1 plus 1 equals 0). In the hope that techniques developed in this space might be converted to work on matrices of real numbers, researchers frequently begin their investigations with this more constrained but still massive space.
Following its training, AlphaTensor was able to recover Strassen’s method in a matter of minutes. After that, it found anywhere from a few hundred to a few thousand new quick methods for each matrix size. These were distinct from the most well-known algorithms but contained the same number of iterations for the multiplication process.
In a few specific instances, AlphaTensor even broke previously established marks. It uncovered a novel algorithm for multiplying 4-by-4 matrices in 47 multiplication steps, which is an improvement above the 49 steps that are required for two iterations of Strassen’s algorithm. This was one of its most startling discoveries, and it occurred in modulo 2 arithmetic. Additionally, it outperformed the most well-known technique for 5-by-5 modulo 2 matrices by lowering the number of necessary multiplications from the previous record of 98 to 96. (However, 91 steps are still needed to surpass Strassen’s approach when it is applied to matrices that are 5-by-5 in size. This new record falls short of that requirement.)
Some researchers have heaped praise on the AI-based improvement over the status quo, which has caused a lot of enthusiasm brought on by the newly discovered high-profile discovery. However, there were other members of the matrix multiplication community who did not share your enthusiasm. Vassilevska Williams stated that she believed there was an excessive amount of hoopla surrounding the event. “It’s yet another instrument. It’s not as if you can say, “Oh, the computers beat the people,” do you know what I mean?”
The researchers also stressed that the record-breaking 4-by-4 method will have limited immediate applications: Not only is it valid just in arithmetic modulo 2, but in real life there are essential considerations that must be taken into account in addition to speed.
Fawzi concurred that, in all honesty, we are just getting started here. “That there is a lot of opportunity for improvement and research is a wonderful thing,” he said. “There is a lot of room for research.”
One Last Turn of Events
In comparison to other well-established computer search methods, AlphaTensor’s greatest strength is also its worst weakness: because it is not restricted by human intuition regarding what constitutes a good algorithm, it is unable to explain the decisions that it makes. Because of this, it is challenging for academics to draw any conclusions from its successes.
However, this may not be as significant of a detriment as it first appears. The mathematician Manuel Kauers and his PhD student Jakob Moosbauer, both of Johannes Kepler University Linz in Austria, claimed another step forward a few days after the AlphaTensor result was announced.
At the time that the DeepMind study was published, Kauers and Moosbauer were in the process of searching for novel multiplication algorithms with a traditional computer-aided search. However, in contrast to the majority of similar searches, which begin from scratch with a new guiding concept, their method functions by repeatedly modifying an already existing algorithm in the hope of squeezing additional savings from it when it comes to doing multiplications. They began by using AlphaTensor’s algorithm for 5-by-5 modulo 2 matrices as a point of reference, and were surprised to discover that their method reduced the number of multiplication steps from 96 to 95 after only a few seconds of computation. This reduction came about as a result of the fact that their method required fewer variables.
AlphaTensor was also instrumental in the company’s ability to indirectly make another development. In the past, Kauers and Moosbauer had not bothered to investigate the space of 4-by-4 matrices because they believed it would not be able to outperform two rounds of Strassen’s technique. However, they were proven wrong. The result obtained by AlphaTensor inspired them to rethink their approach, and after a week of computation time beginning from scratch, their method produced an additional 47-step procedure that was unconnected to the one that AlphaTensor had found. “If anyone had told us that there is something to discover for 4-by-4, we could have done that earlier,” said Kauers. “If somebody had told us that there is something to uncover for 4-by-4.” “But that’s just the way things are, so no big deal.”
Littman believes that there will be other surprises of this nature and compares the current situation to the first time a runner completed a mile in less than four minutes, an accomplishment that was widely believed to be impossible. “People were like, ‘Oh wait, we can do this,’ and now lots of people can do it,” he added. “People were like, ‘Oh wait, we can do this.'”
Fawzi’s long-term goal is to extend AlphaTensor so that it can solve a wider variety of mathematical and computational problems, just like its progenitor, AlphaGo, later expanded its capabilities to include playing other games.
According to Kauers, this is the ultimate litmus test for the use of machine learning in the process of discovering new algorithms. He makes the observation that the search for rapid matrix multiplication algorithms is a combinatorial problem that is ideally suited to be solved by computer searches, with or without the participation of humans. However, not all mathematical issues can be solved with such a simple solution. “This would then be a game changer,” he remarked, referring to the potential impact of the discovery of a fundamentally new algorithmic idea by machine learning.