Meta’s Race to Catch Up on AI: An Insider’s Perspective

In September 2022, Meta CEO Mark Zuckerberg led a meeting with top executives to discuss the company’s computing capacity, specifically its ability to do cutting-edge artificial intelligence (AI) work. Despite making high-profile investments in AI research, Meta had been slow to adopt expensive AI-friendly hardware and software systems for its main business, hampering its ability to keep pace with innovation at scale. To support AI work, Meta would need to fundamentally shift its physical infrastructure design, software systems, and approach to providing a stable platform. Over the past year, Meta has been engaged in a massive project to whip its AI infrastructure into shape. The overhaul spiked Meta’s capital expenditures by about $4 billion a quarter, according to company disclosures, nearly double its spend as of 2021.

The overhaul coincided with a period of severe financial squeeze for Meta, which has been laying off employees since November at a scale not seen since the dotcom bust. Generative AI, which creates human-like written and visual content in response to prompts, has amplified the urgency of Meta’s capacity scramble, as it requires massive computing power. Falling behind on AI can be traced back to Meta’s belated embrace of the graphics processing unit (GPU) for AI work. Until last year, Meta largely ran AI workloads using the company’s fleet of commodity central processing units (CPUs). In 2021, that approach proved slower and less efficient than one built around GPUs.

As Zuckerberg pivoted the company toward the metaverse, its capacity crunch was slowing its ability to deploy AI to respond to threats, like the rise of social media rival TikTok and Apple-led ad privacy changes. Former Meta board member Peter Thiel resigned in early 2022, telling Zuckerberg and his executives they were complacent about Meta’s core social media business while focusing too much on the metaverse, which he said left the company vulnerable to the challenge from TikTok.

After pulling the plug on a large-scale rollout of Meta’s own custom inference chip, which was planned for 2022, executives instead reversed course and placed orders that year for billions of dollars worth of Nvidia GPUs. By then, Meta was several steps behind peers like Google, which had begun deploying its own custom-built version of GPUs, called the TPU, in 2015. Executives also that spring set about reorganizing Meta’s AI units, naming two new heads of engineering in the process, including Santosh Janardhan, the author of the September memo. More than a dozen executives left Meta during the months-long upheaval. Despite the challenges, Meta remains confident in its ability to continue expanding its infrastructure’s capabilities to meet its near-term and long-term needs as it brings new AI-powered experiences to its family of apps and consumer products.