But what about writing software programmes? Artificial intelligence software programmes are startlingly good at having conversations, winning board games, and producing artwork. Researchers at Google DeepMind claim that in standardised programming competitions, their AlphaCode algorithm can compete against the average human coder.
The researchers write in this week’s issue of the journal Science, “This discovery represents the first time an artificial intelligence system has performed competitively in programming contests.”
In simulated evaluations of recent programming competitions on the Codeforces platform, DeepMind’s code-generating system received an average ranking in the top 54.3%, which is a fairly “average” average, so there’s no need to panic about Skynet just yet.
According to Yujia Li, a research scientist at DeepMind and one of the paper’s main authors, “competitive programming is an extremely difficult challenge, and there’s a massive gap between where we are now (solving around 30% of problems in 10 submissions) and top programmers (solving >90% of problems in a single submission). Additionally, the remaining issues are much more challenging than the ones we’ve already resolved.
However, the experiment suggests a fresh field for AI applications. With GitHub-available Copilot, a code-suggesting tool from Microsoft, the frontier is also being explored. CodeWhisperer, a piece of software from Amazon, is comparable.
The recently released research, according to Oren Etzioni, technical director of the AI2 Incubator and founding CEO of Seattle’s Allen Institute for Artificial Intelligence, emphasises DeepMind’s position as a prominent player in the use of AI tools known as large language models, or LLMs.
Etzioni wrote in an email, “This is an impressive reminder that OpenAI and Microsoft don’t have a monopoly on the great feats of LLMs. “In fact, AlphaCode beats both Microsoft’s Github Copilot and GPT-3.”
It might be argued that AlphaCode is notable for both how well it programmes and how it programmes. The system’s lack of doing something is likely what surprises people the most: AlphaCode doesn’t have any clear built-in understanding of how computer code is structured. Instead, J. Zico Kolter, a computer scientist at Carnegie Mellon University, said in a Science article on the work, “AlphaCode relies entirely on a ‘data-driven’ approach to producing code, learning the structure of computer programmes by merely viewing heaps of existing code.”
When an issue is described in natural language, AlphaCode uses a huge language model to create code. The application makes use of a sizable data set of programming challenges and answers, as well as a collection of unstructured code from GitHub. In order to solve the given problem, AlphaCode creates hundreds of potential solutions, filters them to eliminate the invalid ones, groups the viable ideas, and then chooses one example from each group to submit.
It could surprise some people that this process has any chance of producing accurate code, Kolter added.
Kolter suggested integrating AlphaCode’s technology with more organised machine language techniques to boost the system’s functionality.
“Let them try,” he wrote, “whether ‘hybrid’ ML approaches that mix data-driven learning with designed knowledge may do better on these tasks. The die was cast by AlphaCode.
Li stated to GeekWire that AlphaCode is still being improved by DeepMind. Although AlphaCode represents a considerable improvement from 0% to 30%, he acknowledged that more work remained.
Etzioni concurred that the effort to develop code-generating software “had plenty of headroom.” I anticipate quick iterations and improvements, the speaker stated.
“The generative AI “big bang” is just 10 seconds away. Soon, there will be many more outstanding products on a larger range of textual and structured data, according to Etzioni. We are frantically attempting to determine the limits of this technology.
As the project develops, AlphaCode may ignite the ongoing discussion about the benefits and dangers of artificial intelligence, much like DeepMind’s AlphaGo programme did when it showed machine-based mastery at the age-old game of Go. Programming isn’t the only industry where AI’s quick development is stirring up debate, though.
The ability of an OpenAI programme by the name of ChatGPT to react to information requests with in-depth explanations and documents that can range from term papers to fantastical resignation letters has caused a flurry of interest in the tech community.
A discussion about whether AI-based art generating algorithms like Lensa, DALL-E, and Stable Diffusion are unfairly exploiting the millions of preserved pieces of art made by human hands and whether they would eliminate future markets for working, breathing artists has been spurred by these programmes.
Recently, robots have competed against human players in strategic games that, unlike chess or checkers, rely on judgments of incomplete information about the other players. The Stratego board game is the subject of DeepMind’s DeepNash programme, whereas the Diplomacy game is the focus of Meta’s Cicero programme. Some question if these developments will allow AI to be utilised to assist real-world policy makers (or scammers).
When we asked Li if DeepMind had any reservations about the work it was producing, he thoughtfully responded:
“AI has the potential to help with some of humanity’s biggest problems, but it must be developed ethically, safely, and for everyone’s benefit. Depending on how we deploy it, how we utilise it, and the types of things we select to use it for, it will either be helpful or damaging to us and society.
“At DeepMind, we develop AI with care, encouraging review of our work and delaying the release of new technology until dangers and their effects have been carefully considered and mitigated. Our culture of responsible pioneering is driven by our values and focused on responsible governance, responsible research, and responsible impact (you can see our Operating Principles here).
Update for December 8 @ noon PT: Sam Skjonsberg, a leading engineer at the Allen Institute for Artificial Intelligence and the head of the group responsible for creating Beaker, AI2’s internal platform for AI testing, offered his thoughts on AlphaCode:
“It is hardly surprising that LLMs are used in code synthesis. With initiatives like DALL-E, OpenAI Codex, Unified-IO, and of course ChatGPT, the generalizability of these large-scale models is increasingly becoming obvious.
“An intriguing feature of AlphaCode is the post-processing step that filters the solution space to weed out any that crash or are blatantly erroneous. This highlights a crucial point: these models work best when they complement our strengths rather than attempt to replace them.
“I’m curious to know how AlphaCode stacks up against ChatGPT as a source of coding advice. AlphaCode was assessed using a competitive coding exercise, which is an objective performance indicator but says nothing about the readability of the generated code. The results that ChatGPT has achieved have impressed me. Although the code is understandable and simple to edit, they frequently include minor flaws and errors. This is a difficult but crucial component of these models that we will need to figure out how to quantify.
Separately, I commend Google and the AlphaCode research team for making the paper dataset and energy needs available. ChatGPT ought to do the same. Due to the high expense of their operation and training, these LLMs already favour large firms. In order to counteract this, open publishing promotes scientific collaboration and further evaluation, both of which are crucial for advancement and equity.