Adobe Innovators Transform 2D Images into 3D Models in Just 5 Seconds with Cutting-Edge AI Technology

Adobe Innovators Transform 2D Images into 3D Models in Just 5 Seconds with Cutting-Edge AI Technology

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage.

A team of researchers from Adobe Research and Australian National University has developed an impressive artificial intelligence (AI) model that can convert a single 2D image into a high-quality 3D model in just 5 seconds.

This innovation, detailed in their research paper “LRM: Large Reconstruction Model for Single Image to 3D,” has the potential to significantly impact industries like gaming, animation, industrial design, augmented reality (AR), and virtual reality (VR). The researchers highlighted the broad applications of instantly creating a 3D shape from a single image, covering a range of fields motivated by this efficient approach to a long-standing goal.

Training with Massive Datasets

Differing from previous methods that relied on smaller, category-specific datasets, LRM employs a highly scalable transformer-based neural network architecture with over 500 million parameters. It is trained on approximately 1 million 3D objects from the Objaverse and MVImgNet datasets in an end-to-end manner to predict a neural radiance field (NeRF) directly from a given image.

“This combination of a high-capacity model and extensive training data allows our model to generalize effectively and produce high-quality 3D reconstructions from various test inputs, including real-world captures and images from generative models,” the researchers stated.

Yicong Hong, the lead author, described LRM as a significant advancement in single-image 3D reconstruction. According to Hong, LRM is the first large-scale 3D reconstruction model, encompassing over 500 million learnable parameters and trained on about a million 3D shapes and video data across diverse categories.

Experiments demonstrated that LRM can create high-fidelity 3D models from real-world images, as well as from AI-generated images like those from DALL-E and Stable Diffusion. The system excels in producing detailed geometry and preserving intricate textures such as wood grains.

Potential to Transform Industries

The LRM’s potential applications are extensive and exciting, ranging from industry and design to entertainment and gaming. It could simplify the creation of 3D models for video games or animations, reducing time and resource requirements.

In industrial design, the model could speed up prototyping by generating precise 3D models from 2D sketches. In AR/VR, LRM could enhance user experiences by creating detailed 3D environments from 2D images in real-time.

Additionally, LRM’s ability to work with “in-the-wild” captures could enable user-generated content and democratize 3D modeling. People might create high-quality 3D models from photos taken with their smartphones, unlocking a wealth of creative and commercial opportunities.

Blurry Textures a Problem, but Method Advances Field

Despite its promise, the researchers acknowledged limitations like blurry texture generation for occluded regions. However, they expressed optimism about the potential of large transformer-based models trained on vast datasets to innovate generalized 3D reconstruction capabilities.

“In the era of large-scale learning, we hope our idea can inspire future research to explore data-driven 3D reconstruction models that generalize well to arbitrary, in-the-wild images,” they concluded.

You can see more of LRM’s impressive capabilities, including examples of high-fidelity 3D object meshes created from single images, on the team’s project page.