ICRA 2025
AVD2: Accident Video Diffusion for Accident Video Description

Cheng Li1,2,*, Keyuan Zhou1,3,*, Tong Liu1,4,*, Yu Wang1,5,*, Mingqiao Zhuang6,
Huan-ang Gao1, Bu Jin1 and Hao Zhao1,7,8,†
1Institute for AI Industry Research (AIR), Tsinghua University.
2Academy of Interdisciplinary Studies, the Hong Kong University of Science and Technology (HKUST).
3College of Communication Engineering, Jilin University.
4School of Cyber Science and Engineering, Nanjing University of Science and Technology.
5School of Automation, Beijing Institute of Technology.
6College of Foreign Language and Literature, Fudan University.
7Beijing Academy of Artificial Intelligence (BAAI).
8Lightwheel AI.

The corresponding author.

*Indicates Equal Contribution

Accident Video Frames from our contributed EMM-AU dataset.We are the first to generate realistic traffic accident scenarios, improving natural language descriptions and reasoning for autonomous driving.

Abstract

Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and responses. Nonetheless, prevailing methodologies fall short in elucidating the causes of accidents and proposing preventive measures due to the paucity of training data specific to accident scenarios. In this work, we introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding by generating accident videos that aligned with detailed natural language descriptions and reasoning, resulting in the contributed EMM-AU (Enhanced Multi-Modal Accident Video Understanding) dataset. Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations, markedly advancing the domains of accident analysis and prevention.

Accident Videos Generated by AVD2

Description: Highway driving. Our vehicle failed to notice the vehicle on the right and did not decelerate in advance, resulting in a rear-end collision with the vehicle ahead.
Description: Snowy conditions. Our vehicle was driving on an icy road at excessive speed, causing skidding and drifting.
Description: Snowy conditions. Our vehicle was driving on an icy road without proper attention to road conditions, leading to skidding and drifting.
Description: Highway scenario. Our vehicle lost control and drifted, colliding with an oncoming vehicle from the right.
Description: Highway driving. A multi-vehicle collision occurred ahead, and our vehicle was unable to avoid the crash.
Description: Highway scenario. Our vehicle lost control and collided head-on with a vehicle in the opposing lane.
Description: Narrow road driving. Our vehicle did not decelerate in advance, resulting in a rear-end collision with the vehicle ahead.
Description: Highway driving. Due to sudden braking by the vehicle ahead, our vehicle failed to decelerate promptly, causing a rear-end collision.

Video Comprehension (AVD2 VS ChatGPT-4o)

Introduction Video

BibTeX

@article{arxiv:2502.14801,
  author = {Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, Hao Zhao},
  title = {AVD2: Accident Video Diffusion for Accident Video Description},
  journal = {arXiv:2502.14801},
  year = {2025},
  url = {https://doi.org/10.48550/arXiv.2502.14801},
}