This morning, Meta revealed details of its internal infrastructure development for AI workloads, including generative AI, the technology that powers its recently released ad design and creation tools.
It was an attempt by Meta to project strength, as the company has historically been sluggish in adopting hardware that is conducive to AI, hindering its ability to keep up with competitors like Google and Microsoft.
According to Alexis Bjorlin, VP of Infrastructure at Meta, “Building our own [hardware] capabilities gives us control at every layer of the stack, from datacenter design to training frameworks.” “This level of vertical integration is required to scale up AI research and push the envelope.”
The discovery engines, moderation filters, and ad recommenders found throughout Meta’s apps and services are currently powered by AI that was built over the course of the last ten or so years with billions of dollars invested in both recruiting top data scientists and developing new types of AI. However, the corporation has had trouble commercialising many of its most ambitious AI research advances, particularly in the field of generative AI.
Up until 2022, Meta mainly used a combination of CPUs, which are frequently less effective for those kinds of jobs than GPUs, and a specialised chip made for accelerating AI algorithms to run its AI workloads. A large-scale rollout of the bespoke processor, which was scheduled for 2022, was cancelled by Meta in favour of orders for Nvidia GPUs worth billions of dollars, which necessitated a significant redesign of several of its data centres.
In an effort to turn things around, Meta started preparations to begin creating a more ambitious internal chip, scheduled to be released in 2025 and capable of both operating and training AI models. And that was the presentation’s main subject today.
The new chip is known by Meta as the Meta Training and Inference Accelerator, or MTIA, and is referred to as a member of a “family” of chips designed to speed up AI training and inferencing workloads. (Running a trained model is referred to as “inferencing”). The MTIA is an ASIC, a type of chip that combines various circuits on a single circuit board and can be designed to perform one or more functions concurrently.
“We needed a tailored solution that’s co-designed with the model, software stack, and system hardware to gain better levels of efficiency and performance across our important workloads,” Bjorlin stated. “This offers our users across a number of services a better experience.”
Custom AI chips are becoming more popular among major players in the technology industry. To train massive generative AI systems like PaLM-2 and Imagen, Google developed a processor known as the TPU (short for “tensor processing unit”). Customers of AWS can use Amazon’s own chips for both training (Trainium) and inferencing (Inferentia). And according to reports, Microsoft is collaborating with AMD to create the Athena internal AI chip.
According to Meta, the initial generation of the MTIA, known as MTIA v1, was developed in 2020 using 7-nanometer technology. In a Meta-designed benchmark test, which should obviously be taken with a grain of salt, Meta claims that the MTIA handled “low-complexity” and “medium-complexity” AI models more effectively than a GPU. It can scale beyond its own 128 MB of memory to up to 128 GB.
According to Meta, there is still work to be done in the memory and networking parts of the device, which are bottlenecks as AI model size increases and necessitate splitting workloads across multiple chips. Coincidentally, British chip unicorn Graphcore recently purchased a team in Oslo that was developing AI networking technology. The MTIA’s primary focus right now is solely on inference—not training—for “recommendation workloads” throughout the family of Meta’s apps.
But Meta insisted that the MTIA, which it is constantly improving, “greatly” improves the business’s performance per watt while executing recommendation workloads, enabling Meta to run “more enhanced” and “cutting-edge” (ostensibly) AI workloads.
An AI supercomputer
Perhaps in the future, Meta will assign banks of MTIAs to handle the majority of its AI duties. But for the time being, the social network is dependent on the GPUs in its Research SuperCluster (RSC), a supercomputer that is geared towards research.
The RSC, which was created in collaboration with Penguin Computing, Nvidia, and Pure Storage, was first presented in January 2022 and has now finished its second phase of construction. According to Meta, there are already 2,000 Nvidia DGX A100 systems with 16,000 Nvidia A100 GPUs in total.
So why create a supercomputer internally? Peer pressure is one, of course. Microsoft made a big deal out of its OpenAI-built AI supercomputer a few years ago, and it recently announced that it would work with Nvidia to create a new AI supercomputer for the Azure cloud. In other places, Google has bragged about its own AI-focused supercomputer, which surpasses Meta’s thanks to its 26,000 Nvidia H100 GPUs.
Beyond keeping up with the Joneses, however, Meta asserts that the RSC offers the advantage of enabling its researchers to train models using actual data from Meta’s production systems. This contrasts with the organisation’s previous AI infrastructure, which primarily used open-source software and publicly accessible datasets.
A Meta representative said, “The RSC AI supercomputer is used for pushing the limits of AI research in several domains, including generative AI.” ” Really, it’s about how productive AI research is. We intended to offer cutting-edge infrastructure to AI researchers so they could create models and equip them with a training platform to progress AI.
The RSC, according to the manufacturer, can provide roughly 5 exaflops of computing power at its most powerful. (Before that impresses you, keep in mind that some experts take the exaflops performance metric with a grain of salt and that several of the fastest supercomputers in the world are significantly faster than the RSC.)
big language model Meta AI (LLaMA), a big language model that the business released as a “gated release” to researchers earlier in the year (and which later leaked in various online communities), was trained using the RSC, according to Meta. According to Meta, the largest LLaMA model took 21 days and 2,048 A100 GPUs to train.
The representative continued, “By developing our own supercomputing capabilities, we get control over every layer of the stack, from datacenter design to training frameworks.” RSC will support Meta’s AI researchers in creating new and improved AI models that can learn from trillions of examples, operate across hundreds of different languages, and seamlessly analyse text, images, and video together, as well as create new augmented reality tools,” says Meta.
A video encoder
The business said at the event today that, in addition to MTIA, Meta is creating another chip to manage specific kinds of computing workloads. The chip is Meta’s first internally produced ASIC solution and is known as the Meta Scalable Video Processor, or MSVP. It was created to meet the processing requirements of video on demand and live streaming.
Readers may recall that Meta started developing unique server-side video chips years ago, and in 2019, they announced an ASIC for video transcoding and inferencing operations. This is the result of some of those initiatives as well as a fresh attempt to get a competitive edge in the live video market in particular.
“On Facebook alone, people spend 50% of their time on the app watching video,” stated Meta technical lead managers Harikrishna Reddy and Yunqing Chen in a jointly-written blog post released this morning. “Videos uploaded to Facebook or Instagram, for instance, are transcoded into several bitstreams, with varying encoding formats, resolutions, and quality, to service the large diversity of devices used throughout the world (mobile devices, computers, TVs, etc.). MSVP is programmable and scalable, and it can be set up to accommodate both the high-quality transcoding required for VOD and the low latency and quicker processing times that live streaming demands in an effective manner.
The majority of Meta’s “stable and mature” video processing workloads will eventually be offloaded to the MSVP, while software video encoding will only be used for workloads that demand specialised customization and “significantly” greater quality. According to Meta, work is still being done to improve video quality with MSVP using post-processing techniques like artefact removal and super-resolution as well as pre-processing techniques like smart denoising and picture enhancement.
Short-form videos will enable efficient distribution of generative AI, AR/VR, and other metaverse material, according to Reddy and Chen. “In the future, MSVP will allow us to support even more of Meta’s most important use cases and needs, including short-form videos,” they stated.
Using AI
If there’s one thing that all of today’s hardware announcements have in common, it’s Meta’s frantic efforts to accelerate the development of generative AI.
It had been hinted at beforehand. In February, CEO Mark Zuckerberg unveiled a new high-level generative AI team to, in his words, “turbocharge” the company’s R&D. Zuckerberg is said to have made increasing Meta’s computational power for AI a key goal. Additionally, CTO Andrew Bosworth recently stated that generative AI was the area in which he and CEO Mark Zuckerberg were investing the most time. Additionally, according to chief scientist Yann LeCun, Meta intends to use generative AI techniques to produce objects in virtual reality.
During Meta’s Q1 earnings call in April, Zuckerberg stated, “We’re exploring chat experiences in WhatsApp and Messenger, visual creation tools for posts in Facebook and Instagram and ads, and over time video and multi-modal experiences as well.” “I anticipate that these tools will be beneficial for everyone, including average people, artists, and corporations. For instance, after we perfect that experience, I anticipate that there will be a lot of demand for AI agents for corporate communications and customer service. This will eventually apply to our work on the metaverse as well, making it much simpler for individuals to develop avatars, objects, worlds, and the code that connects them all.
In part, investors are putting more pressure on Meta because they believe the business is not moving quickly enough to take advantage of the (perhaps sizable) market for generative AI. It doesn’t yet have a response for chatbots like Bard, Bing Chat, or ChatGPT. Additionally, it hasn’t made much progress in picture production, another crucial area that has experienced rapid expansion.
The addressable market for generative AI software might reach $150 billion if the forecasts are accurate. According to Goldman Sachs, it will increase GDP by 7%.
The billions Meta wasted on investments in “metaverse” technology like augmented reality headsets, meeting software, and virtual reality amusement parks like Horizon Worlds may be made up for with even a small portion of that. A $4 billion net loss was reported by Reality Labs, the Meta division in charge of augmented reality technology, in the most recent quarter, and the firm stated during its Q1 call that it anticipates “operating losses to increase year over year in 2023.”