Meta Engineer: Just Two Nuclear Power Plants Required for AI Operations by Next Year

Meta Engineer: Just Two Nuclear Power Plants Required for AI Operations by Next Year

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Meta’s director of engineering for Generative AI, Sergey Edunov, shared an intriguing insight into the power required to meet the growing demand for AI applications next year: about the equivalent of two new nuclear power plants.

Edunov oversees Meta’s training efforts for its Llama 2 open-source foundation model, considered one of the leading AI models. Speaking at the Digital Workers Forum in Silicon Valley, he mentioned that two power plants should be sufficient to meet humanity’s AI needs for a year, responding to concerns about the power demands of AI, particularly generative AI, he confidently stated, “We can definitely solve this problem.”

He clarified that his estimate was based on rough calculations but provided a reasonable approximation of the power needed for AI “inference.” Inference is when AI is used in applications to answer questions or make recommendations, different from the “training” process, where the model learns from large datasets to prepare for inference.

Training of large language models (LLMs) has recently been under scrutiny due to its initial massive processing requirements. However, once trained, these models are reused for inference, which is where AI applications truly operate.

Regarding inference, Edunov noted that Nvidia could release between one and two million of its H100 GPUs next year. Utilizing these GPUs could potentially generate about 100,000 tokens per person globally each day. Tokens, the fundamental units of text processed and generated by LLMs, can range from words to single characters.

For electricity, each H100 GPU uses about 700 watts, but including support for data centers and cooling, Edunov rounded it to 1KW per GPU. Summing it up, this translates to the power of two nuclear reactors. “At the scale of humanity, it’s not that much,” he said, adding that we could afford up to 100,000 tokens daily per person.

After the session, Edunov clarified that his comments referred to the power required for additional AI computations from new Nvidia H100s, specifically designed for AI applications. Besides H100s, older Nvidia GPUs and CPUs from AMD and Intel, along with specialized AI accelerators, contribute to AI inference.

Switching to the subject of training generative AI, the main challenge lies in acquiring enough data. Edunov mentioned that GPT-4 was trained using data from the entire internet, but refined and de-duplicated data might reduce it to 10-20 trillion tokens. High-quality data would shrink it further. Next-generation models may require 10 times more data, potentially up to 200 trillion tokens, which exceeds publicly available data, pushing for more efficient models and alternative data sources like multimodal data (e.g., video).

Sharing the panel with Edunov were Nik Spirin from Nvidia and Kevin Tsai from Google. They discussed other data sources beyond the public internet, such as secure forums, which organizations could use to customize models. Spirin advocated for supporting the best open-source foundation models to avoid redundant efforts and save on computing resources. Tsai highlighted several technologies, like retrieval-augmented generation (RAG), to fine-tune models with large data sets, alongside innovations like sparse semantic vectors.

Looking ahead, the panelists predicted significant advancements for LLMs over the next two to three years. While the full potential of LLMs is uncertain, they agreed these models will provide substantial value to enterprises soon. Improvements in LLMs could continue to grow exponentially or level off, suggesting we’ll know within three to four years if artificial general intelligence (AGI) is attainable.

Nvidia’s Spirin anticipated initial slow adoption by enterprises, followed by substantial benefits within two years, drawing parallels to earlier AI technology waves. Tsai pointed out that supply-chain constraints, particularly Nvidia’s dependence on high bandwidth memory for GPUs, are the current bottleneck but emphasized ongoing innovation efforts like Salesforce’s Blib-2 project to develop efficient, smaller models.

Stay informed with our daily updates to keep up with the latest in AI technology.