Apple’s Machine Learning team, working with researchers from Nanjing University and The Hong Kong University of Science and Technology, introduced Matrix3D. This new AI model reconstructs 3D objects and scenes using only a few 2D photos, representing a significant shift from current methods.
The field Matrix3D innovates in, as highlighted in 9to5Mac’s coverage, is photogrammetry. This science uses photographs to make measurements for creating 3D models or maps, and traditionally involves multiple, separate models for tasks like pose estimation and depth prediction.
According to the researchers, this multi-stage process can lead to inefficiencies and errors. Matrix3D addresses this by performing all these steps in a single, unified process. It takes images, camera parameters like angle and focal length, and depth data, processing them through one architecture. This approach simplifies your workflow and improves accuracy.
New Approach to 3D Reconstruction
The training method for Matrix3D is particularly noteworthy. Researchers employed a masked learning strategy. This technique, similar to that used in early Transformer-based AI systems, which contributed to developments like the initial versions of ChatGPT, involved randomly hiding parts of the input data during training. This forced Matrix3D to learn how to fill in missing information.
The team states this method is crucial because it allows Matrix3D to train effectively even with smaller or incomplete datasets. The results demonstrate the model’s capability. With just three input images, Matrix3D can generate detailed 3D reconstructions of individual objects and entire environments.
As reported by the research team, this unified diffusion transformer model features flexible input and output configurations, supports several core photogrammetry tasks, and is optimizable end-to-end. This eliminates the need for multiple task-specific models.
Implications for Apple Vision Pro
The ability of Matrix3D to create detailed 3D environments from minimal input has clear applications. This technology could significantly enhance experiences on immersive headsets like the Apple Vision Pro. You could use it to quickly generate 3D content or bring real-world spaces into virtual environments with greater ease. The researchers highlight its potential as an innovative tool for 3D content creation, offering fine-grained control through multi-round interactions.
The team has made the source code for Matrix3D available on GitHub and published their research paper on arXiv. You can also visit their project website to view sample videos and interact with point cloud recreations.