Xiaomi has officially expanded its tech horizons by entering the field of advanced robotic research. After MiMo (the ChatGPT rival) from Lei Jun’s company, Xiaomi presented Xiaomi-Robotics-0, its first large-scale Vision-Language-Action (VLA) model released in open-source mode.
With an architecture based on 4.7 billion parameters, the system was designed to integrate visual understanding, language reasoning and real-time execution of physical actions into a single model.
The main innovation of Xiaomi-Robotics-0 lies in its Mixture-of-Transformers (MoT) architecture, which simulates the collaboration between the brain and the human cerebellum. The system assigns command understanding to a linguistic-visual model (VLM) that acts as the “brain”, capable of interpreting human instructions even when vague and analyzing spatial relationships through high-definition video input.
Parallel to this, movement management is delegated to an Action Expert that uses a diffusion transformer (DiT) to generate fluid and precise movement sequences. The task separation allows balancing deep logical reasoning with extremely fine motor control, preventing the robot from losing its general understanding capabilities while learning new physical tasks.
A key aspect that makes this model extremely practical for developers is the solution adopted to eliminate micro-stutters and instability in movements, often caused by processing latency. Xiaomi has introduced asynchronous inference, a technique that decouples the model’s reasoning process from the robot’s physical execution, ensuring continuity of action even when the system requires more time to process a complex command.
Moreover the necessary safeguards enable the robot to prioritize immediate visual feedback over historical memory, making it capable of reacting instantly to sudden changes in the surrounding environment.
The performance of Xiaomi-Robotics-0 has already been validated by benchmarks (such as LIBERO, CALVIN and SimplerEnv) where the model outperformed dozens of competing systems. In real tests, dual-arm robots successfully completed long-range tasks, such as disassembling construction blocks and handling soft and flexible objects.
An important detail is hardware compatibility: the model supports real-time inference even on consumer-grade GPUs, drastically lowering the entry barriers for research and development in robotics.
The Chinese company has released the source code and model weights of Xiaomi-Robotics-0 on platforms such as GitHub and Hugging Face. Also on GitHub you can find the project’s main page at the project’s main page.
Google is preparing to introduce a brand-new device for its wearables lineup, entering direct competition…
Recently Vivo announced its new Camera Phone for the Chinese market, with a major novelty…
Google seems intent on tightening protection measures related to software on its newer smartphones. According…
The market for TWS headphones is now saturated, with fierce competition among brands to offer…
The company founded by Carl Pei aims to expand its product ecosystem well beyond smartphones.…
A few days after the renders dedicated to the standard model, we are back to…