Open Source · Alibaba · Nvidia · Android · China · Decrypt
For these apply cases, Alibaba announced this new AI suite with different components
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
★ Tier-1 Source
Qwen-RobotNav unifies five navigation tasks—instruction following, point-goal navigation, object search, target tracking, and autonomous driving—each demanding different visual memory strategies.
Key facts
- The Embodied World Knowledge corpus spans 8.6 million video-text pairs—200 million frames—across manipulation (5.9 million samples, 1,300+ skills, 20+ morphologies), autonomous driving (Waymo, NVIDIA
- Trained on 15.6 million samples with randomization across all parameters, it achieves 76.5% success on VLN-CE RxR, a benchmark for vision-and-language navigation in real-world environments, and 90%
- Qwen-RobotNav, the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal,… pic.twitter.com/noumjTtTeS
- While Western labs (Google DeepMind, Nvidia, Figure, Physical Intelligence) pursue similar goals, most focus on navigation or manipulation, not a unified, composable suite
Summary
Alibaba unveiled the Qwen-Robot Suite, a trio of AI models designed to handle robot navigation, manipulation, and physics-based world simulation through a unified software stack. The company says its models top multiple robotics benchmarks, using millions of training samples and tens of thousands of hours of open-source robot data. Alibaba's Qwen team dropped the Qwen-Robot Suite on Tuesday: three foundation models forming what they call a "full stack for embodied intelligence. Together, they're the Android moment for robotics—the operating system, not the hardware. Introducing the Qwen-Robot Suite, Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence. Qwen-RobotNav, the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal,… pic.twitter.com/noumjTtTeS.