IT Brief New Zealand - Technology news for CIOs & IT decision-makers
Story image

NVIDIA unveils new Cosmos models for physical AI control

Yesterday

NVIDIA has introduced a new release of its Cosmos world foundation models, which provide developers with enhanced control over world generation and introduce open, customisable reasoning models for physical AI development.

The announcement includes the launch of two new blueprints driven by the NVIDIA Omniverse and Cosmos platforms. These blueprints are designed to offer developers vast capabilities for generating synthetic data, which is essential for the post-training of robots and autonomous vehicles.

Among the early adopters of these new technologies are 1X, Agility Robotics, Figure AI, Skild AI, and Uber. These industry leaders have started utilising Cosmos for the speedy and scalable creation of enriched training data for physical AI.

Jensen Huang, Founder and CEO of NVIDIA, remarked, "Just as large language models revolutionised generative and agentic AI, Cosmos world foundation models are a breakthrough for physical AI. Cosmos introduces an open and fully customisable reasoning model for physical AI and unlocks opportunities for step-function advances in robotics and the physical industries."

The Cosmos Transfer WFMs can process structured video inputs, including segmentation maps and lidar scans, and generate controllable photorealistic video outputs. This system facilitates the perception training process by transforming 3D simulations into realistic videos, thus supporting large-scale synthetic data generation.

Pras Velagapudi, Chief Technology Officer of Agility Robotics, noted, "Cosmos offers us an opportunity to scale our photorealistic training data beyond what we can feasibly collect in the real world. We're excited to see what new performance we can unlock with the platform, while making the most use of the physics-based simulation data we already have."

The NVIDIA Omniverse Blueprint for autonomous vehicle simulation applies Cosmos Transfer to enhance sensor data variations, such as those related to weather and lighting conditions, resulting in a rich variety of driving datasets. Similarly, Parallel Domain employs the blueprint for its sensor simulation.

The NVIDIA GR00T Blueprint combines Omniverse and Cosmos Transfer to efficiently produce diverse datasets on a large scale, benefitting from OpenUSD-powered simulations which reduce data collection and augmentation time significantly.

Cosmos Predict WFMs, unveiled earlier this year, allow for virtual world generation from multimodal inputs and can predict actions or motion trajectories from image sequences. These models are customisable and designed to aid in the post-training phase.

The computing power of NVIDIA Grace Blackwell NVL72 systems facilitates real-time world generation, a feature being utilised by companies such as 1X and Skild AI for training their robots and augmenting datasets.

The Cosmos Reason model provides a spatiotemporally aware reasoning system that can interpret video data and predict outcomes in natural language, assisting developers in enhancing physical AI models or creating new ones.

Through the use of NVIDIA's platforms such as DGX Cloud and NeMo Curator, developers can optimise data processing for physical AI tasks. This capacity has been harnessed by enterprises including Linker Vision and Virtual Incision for large-scale data curation projects.

In alignment with NVIDIA's principles on responsible AI, the company has partnered with Google DeepMind to employ SynthID for watermarking AI-generated content, promoting transparency.

These Cosmos WFMs are now available for preview in the NVIDIA API catalog, as well as the Google Cloud's Vertex AI Model Garden, with Cosmos Predict and Cosmos Transfer accessible on Hugging Face and GitHub. Cosmos Reason is available under early access.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X