Nvidia Introduces Retriever, Launches DGX Cloud, and Debuts Project Ceiba Supercomputer on AWS

Nvidia Introduces Retriever, Launches DGX Cloud, and Debuts Project Ceiba Supercomputer on AWS

Join our daily and weekly newsletters for the latest updates and exclusive content on the leading trends in AI. Learn more.

Nvidia and Amazon Web Services (AWS) are strengthening their partnership with several major announcements at the AWS re:Invent conference. One significant reveal is Nvidia’s DGX Cloud, which will bring the Grace Hopper GH200 superchip to AWS for the first time. Additionally, Project Ceiba aims to create what could be the world’s largest public cloud supercomputing platform, powered by Nvidia and running on AWS, offering 64 exaflops of AI power. AWS is also introducing four new types of GPU-powered cloud instances to its EC2 service.

To assist organizations in developing better large language models (LLMs), Nvidia is launching NeMo Retriever technology at AWS re:Invent. This Retrieval Augmented Generation (RAG) approach connects enterprise data to generative AI.

Nvidia and AWS have collaborated for over 13 years, starting when Nvidia GPUs were first integrated into AWS cloud computing instances in 2010. Ian Buck, Nvidia’s VP of Hyperscale and HPC, noted that the two companies have been working together to enhance innovation and operations for AWS and their mutual clients, including Anthropic, Cohere, and Stability AI.

The DGX Cloud, initially announced at Nvidia’s GPU Technology Conference, is designed to deliver supercomputing capabilities for AI. This new iteration for AWS incorporates Nvidia’s Grace Hopper superchip, combining ARM compute with GPUs. The AWS version uses GH200 chips in a rack architecture called the GH200 NVL-32, integrating 32 superchips connected by Nvidia’s high-speed NVLink technology, delivering up to 128 petaflops of AI performance and 20 terabytes of fast memory.

In a parallel effort, Project Ceiba will be built with 16,000 Grace Hopper Superchips and AWS technologies like the Elastic Fabric Adapter (EFA), AWS Nitro system, and Amazon EC2 UltraCluster scalability. The system, expected to provide 64 exaflops of AI performance and up to 9.5 petabytes of total memory, will support Nvidia’s research and development in various fields, including graphics, large language models, digital biology, and robotics.

Nvidia’s NeMo Retriever technology aims to enhance enterprise-grade chatbots by connecting LLMs with enterprise data, enabling more accurate and timely responses. This technology includes a suite of enterprise-grade models and retrieval microservices designed for seamless integration into enterprise workflows. Early adopters of NeMo Retriever include Dropbox, SAP, and ServiceNow, which benefit from its state-of-the-art accuracy and low latency for retrieval augmented generation.