Sign up for our newsletters for the latest updates and exclusive content on AI advancements.
Innovation in machine learning and AI training is speeding up, especially as more complex generative AI tasks become available.
Today, MLCommons released the MLPerf 4.0 training benchmark, showcasing record-breaking performance levels. The MLPerf training benchmark is a neutral standard widely accepted across the industry. It measures the performance of complete AI training systems across various tasks. Version 4.0 features over 205 results from 17 different organizations. This latest update marks the first MLPerf training results release since November 2023.
The MLPerf 4.0 benchmarks include results for image generation using Stable Diffusion and training large language models (LLMs) like GPT-3. The benchmarks also introduce several first-time results, including a new LoRA benchmark that fine-tunes the Llama 2 70B language model for document summarization using a parameter-efficient method.
As usual with MLPerf results, comparing them to just six months ago reveals significant improvements.
The performance of Stable Diffusion training has increased by 1.8 times since November 2023, and GPT-3 training is up to 1.2 times faster.
Improving AI training performance involves more than just better hardware. Multiple factors contribute to training an AI model, including software and the network connecting clusters. Using a combination of these elements, improvements in performance and efficiency are achievable. Most of these systems use multiple processors or accelerators, and how the workload is divided and communicated is crucial. It’s not just about better silicon; better algorithms and scaling also contribute to long-term performance gains.
Nvidia played a significant role in the MLPerf 4.0 benchmarks, setting new performance records in five out of nine tested workloads. Impressively, these new records were set using the same core hardware platforms Nvidia used a year ago in June 2023. The Nvidia H100 Hopper architecture continues to add value. Throughout Nvidia’s history with deep learning, each generation of products typically offers two to 2.5 times more performance due to software innovation over its lifecycle.
For MLPerf 4.0 training, Nvidia employed various techniques to enhance performance, including full-stack optimization, highly-tuned FP8 kernels, FP8-aware distributed optimizers, optimized cuDNN FlashAttention, improved math and communication execution overlap, and smart GPU power allocation.
These benchmarks hold significant importance for enterprises. Beyond providing standardized benchmarks for training performance, they offer more value than just the raw numbers. Nvidia’s ability to deliver new value from existing architectures demonstrates a long-term return on investment. As organizations consider new deployments, particularly on-premises, they need technology platforms that offer growing benefits over time, making the case for why performance improvements are so critical for businesses.