DualDiff

DualDiff: A Dual-Branch Diffusion Model for Multi-View and Video-Level Driving Scene Generation

DualDiff is a novel dual-branch conditional diffusion framework tailored for realistic and temporally consistent driving scene generation, capable of integrating multi-view semantics and video dynamics for high-fidelity outputs.


🗞️ Project News


🧠 Abstract

Autonomous driving requires photorealistic, semantically consistent, and temporally coherent simulation of driving scenes. Existing diffusion-based generation models typically rely on coarse inputs such as 3D bounding boxes or BEV maps, which fail to capture fine-grained geometry and semantics, limiting controllability and realism.

We introduce DualDiff, a dual-branch conditional diffusion model designed to enhance both spatial and temporal fidelity across multiple camera views. Our framework is characterized by the following contributions:

Results: DualDiff outperforms prior methods on the nuScenes dataset, achieving:


🎬 Visual Examples

Generated scenes by DualDiff+:

generated_video generated_video generated_video generated_video


🔧 Method Overview

DualDiff consists of a dual-stream conditional UNet where foreground and background features are processed independently and merged through residual learning.

Key architectural components:

Framework


🚀 Quick Start

1. Clone Repository

git clone --recursive https://github.com/yangzhaojason/DualDiff.git

2. Environment Setup

conda create -n dualdiff python=3.8
conda activate dualdiff
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements/dev.txt

Additional setups:

cd third_party/xformers && pip install -e .
cd ../diffusers && pip install -e .
cd ../bevfusion && python setup.py develop

3. Data Preparation

4. Pretrained Weights


🏋️ Training & Evaluation

Training

accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes {num_gpu}   tools/train.py +exp={exp_config_name} runner=8gpus   runner.train_batch_size={train_batch_size}   runner.checkpointing_steps=4000 runner.validation_steps=2000

Evaluation

accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes {num_gpu}   perception/data_prepare/val_set_gen.py resume_from_checkpoint=magicdrive-log/{generated_folder}   task_id=dualdiff_gen fid.img_gen_dir=./tmp/dualdiff_gen +fid=data_gen   +exp={exp_config_name} runner.validation_batch_size=8

FID / FVD Metrics

python tools/fid_score.py cfg   resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400   fid.rootb=tmp/dualdiff_gen

📈 Quantitative Results

Visual and numerical evaluation of DualDiff on nuScenes:

Results


📚 Citation

@article{yang2025dualdiff+,
  title={DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance},
  author={Yang, Zhao and Qian, Zezhong and Li, Xiaofan and Xu, Weixiang and Zhao, Gongpeng and Yu, Ruohong and Zhu, Lingsi and Liu, Longjun},
  journal={arXiv preprint arXiv:2503.03689},
  year={2025}
}
@inproceedings{li2025dualdiffdualbranchdiffusionmodel,
      title={DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion}, 
      author={Haoteng Li and Zhao Yang and Zezhong Qian and Gongpeng Zhao and Yuqi Huang and Jun Yu and Huazheng Zhou and Longjun Liu},
      booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
      year={2025},
      pages={},
      organization={IEEE},
      address={},
      url={https://arxiv.org/abs/2505.01857 }, 
}

For full details, visit the project page or the GitHub repository.