Zhao Yang (杨昭)

I am pursuing my PhD in the School of Artificial Intelligence at Xi’an Jiaotong University (XJTU),
with research affiliations at the State Key Laboratory of Human-Machine Hybrid Augmented Intelligence,
the National Engineering Research Center for Visual Information and Applications,
and the Institute of Artificial Intelligence and Robotics.

My research lies at the intersection of world-model-centric embodied AI, autonomous driving, and vision-language-action (VLA) systems.
I’m particularly interested in how latent world models, long-horizon control, and diffusion-based planning can be combined to make robot policies reliable, temporally consistent, and deployable on real platforms.

Previously, I worked on world-model and perception systems at Baidu Apollo (ADFM), Alibaba DAMO Academy, and Huawei Noah’s Ark Lab, and I am now founding an embodied-intelligence startup, Cytoderm Intelligent Technology.



Research Overview

My long-term goal is to build world-model-first autonomous robotics capable of robust long-horizon reasoning, dexterous manipulation, and real-world deployment.

Methodologically, my work spans:

On the application side, my work has been validated on large-scale robotic manipulation benchmarks such as CALVIN, LIBERO, and real-world robotic platforms, as well as autonomous-driving datasets including nuScenes, KITTI-360, and Waymo.


Career & Education History

Role Institution / Company Period
Chief Algorithm Researcher Cytoderm Intelligent Technology (Robotics & Embodied AI) 2025 – Present
Senior Algorithm Engineer Baidu · Apollo (Autonomous Driving & World Models) 2023 – 2025
PhD Student Xi’an Jiaotong University(Institute of Artificial Intelligent and Robotics) 2023 – Present
Senior Algorithm Engineer Alibaba · DAMO Academy (3D Perception & BEV Representation) 2021 – 2023
Algorithm Engineer Huawei · Noah’s Ark Lab and Cloud (Machine Learning & Computer Vision) 2019 – 2022
MS Student Huazhong University of Science and Technology (HUST) 2017 – 2019

Expertise


News


Publications

A more complete and up-to-date list is available on Google Scholar.

Recent works (2025–2024)

ChunkFlow Under review
ChunkFlow: Towards Continuity-Consistent Chunked Policy Learning
Zhao Yang, Yinan Shi, et al.
Under review, 2025.
VLA policy with chunked actions, overlap blending, and continuity-constrained RL to suppress boundary jitter in long-horizon control.
[PDF] [Code] [Project]
DriVerse ACM MM 2025
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
Xiaofan Li, Chenming Wu, Zhao Yang*, et al.
ACM Multimedia (ACM MM), 2025.
A navigation-centric world model that conditions on multimodal trajectory prompts and enforces motion-aligned latent dynamics for driving simulation.
[PDF] [Code]
U-ViLAR ICCV 2025
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu, Zhao Yang, et al.
IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
Differentiable association and registration with explicit uncertainty modeling for robust large-scale localization in autonomous driving.
[PDF]
Causal-Planner IROS 2025
Causal-Planner: Causal Interaction Disentangling with Episodic Memory Gating for Autonomous Planning
Yibo Yuan, Jianwu Fang, Yang Zhou, Zhao Yang, et al.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025.
Causal interaction modeling and memory-gated policy for interpretable planning in dynamic driving environments.
[PDF] [Code]
Semi-supervised BEV 3D detection ICRA 2025
Towards Accurate Semi-Supervised BEV 3D Object Detection with Depth-Aware Refinement and Denoising-Aided Alignment
Zhao Yang, Yinan Shi, et al.
IEEE International Conference on Robotics and Automation (ICRA), 2025.
Semi-supervised BEV detector with depth-aware refinement, denoising-aided alignment, and robust pseudo-labeling on nuScenes.
[PDF] [Code]
DualDiff+ Under Review
DualDiff+: Dual-branch Diffusion for High-fidelity Video Generation with Reward Guidance
Zhao Yang, Zezhong Qian, et al.
Under review, 2025.
Dual-branch diffusion with semantic and motion reward alignment for high-resolution, temporally coherent video synthesis.
[PDF] [Code] [Project]
DualDiff ICRA 2025
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion
Zhao Yang, Haoteng Li, et al.
IEEE International Conference on Robotics and Automation (ICRA), 2025.
Dual-branch diffusion model that fuses semantic BEV priors with image features for controllable driving video and occupancy generation.
[PDF] [Code] [Project]
Object-Guided Semi-Supervised BEV 3D Detection T-ITS 2025
Object-Guided Semi-Supervised Bird’s-Eye View 3D Object Detection with 3D Box Refinement
Zhao Yang, Yinan Shi, et al.
IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2025.
Semi-supervised BEV framework with object-guided consistency and box refinement, improving label efficiency on large-scale driving datasets.
[PDF] [Code]
EquivFisheye Info Fusion 2025
EquivFisheye: A Spherical Fusion Framework for Panoramic 3D Perception with Surround-View Fisheye Cameras
Zhao Yang, Xinglin Pu, et al.
Information Fusion, 2025.
Spherical feature fusion and equivariant pooling for efficient panoramic 3D perception from multi-view fisheye cameras.
[PDF] [Code]
Cadkp CVPR 2024
Cadkp: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
Haonan Zhang, Longjun Liu, Yuqi Huang, Zhao Yang, et al.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Category-aware distillation and structured pruning to obtain compact yet accurate 3D detectors for autonomous driving.
[PDF] [Code]
RAG-Guided LLMs ACM MM 2024
RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Correction
Jun Yu, Yunxiang Zhang, Zerui Zhang, Zhao Yang et al.
ACM Multimedia (ACM MM), 2024.
Retrieval-augmented LLMs for spatial description, with adaptive hallucination correction in vision-language reasoning.
[PDF]
VideoMAE adapters ACM MM 2024
Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize
Jun Yu,Guopeng Zhao, Yaohui Zhang, Peng He, Zerui Zhang, Zhao Yang, et al.
ACM Multimedia (ACM MM), 2024.
Temporal adapters and multi-scale fusion for fine-grained micro-expression analysis.
[PDF]
IC-Mapper ACM MM 2024
IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction
Jiangtong Zhu*, Zhao Yang*, et al.
ACM Multimedia (ACM MM), 2024.
Instance-centric spatio-temporal modeling for online HD map vectorization from autonomous driving logs.
[PDF] [Code]
CVPR 2024 Predictive World Model track CVPRW 2024 · 1st place
The 1st-Place Solution for CVPR 2024 Autonomous Driving Grand Challenge Track on Predictive World Model
Z. Yang et al.
CVPR 2024 Workshop on Autonomous Driving, 2024.
Large-scale world-model solution for multi-step predictive planning in autonomous driving, achieving 1st place in the CVPR 2024 challenge.
[Project]

Earlier works

Malware GAN Journal 2022
Flexible Android Malware Detection Model Based on Generative Adversarial Networks with Code Tensor
Zhao Yang, Fengyang Deng, et al.
IEEE Transactions on Cyber-Enabled Distributed Computing and Systems, 2022.
GAN-based code-tensor modeling for robust Android malware detection.
[Paper]
Secure tensor decomposition T-CSS 2020
Secure Tensor Decomposition for Heterogeneous Multimedia Data in Cloud Computing
Zhao Yang, Cai Fu, et al.
IEEE Transactions on Computational Social Systems (T-CSS), 2020.
Privacy-preserving tensor decomposition framework for heterogeneous multimedia data in cloud environments.
[Paper]
WebVision 2020 CVPRW 2020 · 1st place
The 1st-Place Solution for WebVision CVPR 2020 Virtual Challenge
Zhao Yang et al.
CVPR WebVision Challenge Workshop, 2020.
Large-scale web-vision model with robust training under noisy labels, achieving 1st place in the WebVision 2020 challenge.
[Project]


Service & Misc.

If you are interested in collaborations on world models, VLA agents, or diffusion-based planning, feel free to contact me by email.