DualDiff is a novel dual-branch conditional diffusion framework tailored for realistic and temporally consistent driving scene generation, capable of integrating multi-view semantics and video dynamics for high-fidelity outputs.
Autonomous driving requires photorealistic, semantically consistent, and temporally coherent simulation of driving scenes. Existing diffusion-based generation models typically rely on coarse inputs such as 3D bounding boxes or BEV maps, which fail to capture fine-grained geometry and semantics, limiting controllability and realism.
We introduce DualDiff, a dual-branch conditional diffusion model designed to enhance both spatial and temporal fidelity across multiple camera views. Our framework is characterized by the following contributions:
Results: DualDiff outperforms prior methods on the nuScenes dataset, achieving:
Generated scenes by DualDiff+:

DualDiff consists of a dual-stream conditional UNet where foreground and background features are processed independently and merged through residual learning.
Key architectural components:

git clone --recursive https://github.com/yangzhaojason/DualDiff.git
conda create -n dualdiff python=3.8
conda activate dualdiff
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements/dev.txt
Additional setups:
cd third_party/xformers && pip install -e .
cd ../diffusers && pip install -e .
cd ../bevfusion && python setup.py develop
data/nuscenes/
├── maps
├── mini
├── samples
├── sweeps
├── v1.0-mini
└── v1.0-trainval
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes
Place all .pkl annotation files as instructed (see README).
accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes {num_gpu} tools/train.py +exp={exp_config_name} runner=8gpus runner.train_batch_size={train_batch_size} runner.checkpointing_steps=4000 runner.validation_steps=2000
accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes {num_gpu} perception/data_prepare/val_set_gen.py resume_from_checkpoint=magicdrive-log/{generated_folder} task_id=dualdiff_gen fid.img_gen_dir=./tmp/dualdiff_gen +fid=data_gen +exp={exp_config_name} runner.validation_batch_size=8
python tools/fid_score.py cfg resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 fid.rootb=tmp/dualdiff_gen
Visual and numerical evaluation of DualDiff on nuScenes:

@article{yang2025dualdiff+,
title={DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance},
author={Yang, Zhao and Qian, Zezhong and Li, Xiaofan and Xu, Weixiang and Zhao, Gongpeng and Yu, Ruohong and Zhu, Lingsi and Liu, Longjun},
journal={arXiv preprint arXiv:2503.03689},
year={2025}
}
@inproceedings{li2025dualdiffdualbranchdiffusionmodel,
title={DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion},
author={Haoteng Li and Zhao Yang and Zezhong Qian and Gongpeng Zhao and Yuqi Huang and Jun Yu and Huazheng Zhou and Longjun Liu},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2025},
pages={},
organization={IEEE},
address={},
url={https://arxiv.org/abs/2505.01857 },
}
For full details, visit the project page or the GitHub repository.