Xiaosong Jia (贾萧松)
jiaxiaosong1997 [At] gmail [dot] com or jiaxiaosong [At] fudan [dot] edu [dot] cn

I am a tenure-track Assistant Professor in Institute of Trustworthy Embodied AI (TEAI, 可信具身智能研究院) at Fudan University, and a member of the Fudan Vision and Learning Laboratory.

Previously, I received my B.Eng (IEEE Elite Program & Zhiyuan Honor Class) and Ph.D. (Wu Honor Class) in Computer Science at Shanghai Jiao Tong University, with the fortune to be adviced by Prof. Junchi Yan.

Email  /  Google Scholar  /  Github

profile photo

News

  • 2026.03 – 🎉 Congrats to undergraduate Junqi on getting Princeton PhD offer (Field: Autonomous Driving)!
  • 2026.03 – 🎉 Congrats to undergraduate Cunxin on getting UIUC PhD offer (Field: Embodied AI)!
  • 2026.02 – Five papers accepted to CVPR 2026.
Research

I am interestied in Autonomous Driving and Dexterous Hand. Currently, I am focusing on (i) building authentic world simulators with generative models & reconstructive models and then (ii) combining imitation learning & reinforcement learning to train decision-making agent in an end-to-end way.
I am recruiting research assistants, master, and Ph.D. students。 (报考智能机器人与先进制造创新学院,可信具身智能智能研究院。报考创智学院、中关村学院、河套学院并获得offer的可以招收。).
更多招生、合作、研究信息请见 此页

Selected Publications & Projects

* denotes Co-First Authors, ✉ denotes Correspondence Authors

PontTuset DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
Zhenjie Yang*, Yilin Chai*, Xiaosong Jia ✉ Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, Junchi Yan✉
CVPR, 2026

Mixture of expert π0 for autonomous driving.

PontTuset Spatial Retrieval Augmented Autonomous Driving
Xiaosong Jia*, Chenhe Zhang*, Yule Jiang*, Songbur Wong*, Zhiyuan Zhang, Chen Chen, Shaofeng Zhang, Xuanhe Zhou, Xue Yang, Junchi Yan, Yu-Gang Jiang
CVPR, 2026

Introduce memory retrieval for autonomous driving. New Paradigm.

PontTuset TrajTok: What makes for a good trajectory tokenizer in behavior generation?
Zhiyuan Zhang, Xiaosong Jia ✉, Guanyu Chen, Qifeng Li, Zuxuan Wu, Yu-Gang Jiang, Junchi Yan ✉
ICLR, 2026

Tokenizer for trajectory. Champion of Waymo Sim Agents Challenge 2026 ($10000).

PontTuset Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Xiaosong Jia*, Yihang Sun* (大二), Junqi You*(大三), Songbur Wong, Zichen Zou, Junchi Yan, Zuxuan Wu, Yu-Gang Jiang
ICLR, 2026

Efficiently traverse in the video.

PontTuset Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan*(大三), Xiaosong Jia*, Yihang Sun, Yixiao Wang, Jianglan Wei, Ziyang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding
ICLR, 2026

Interleaved inputs for VLA enabling zero-shot visual prompts.

PontTuset Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model
Junqi You*(大三), Xiaosong Jia*, Zhiyuan Zhang, Yutao Zhu, Junchi Yan
arXiv, 2024

Clsoed-loop autonomous driving benchmark based on generative models which could replay real world data in a reactive way.

PontTuset Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia✉, Qifeng Li, Xue Yang, Maoqing Yao, Junchi Yan✉
NeurIPS , 2025

End-to-end Reinforcement Learning for Autonomous Driving.

PontTuset DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Xiaosong Jia*, Junqi You*(大二), Zhiyuan Zhang*(大三), Junchi Yan
ICLR , 2025

Unifying all tasks with queries and attention. Strong Scaliability.

PontTuset FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
Yutao Zhu*, Xiaosong Jia*, Xinyu Yang, Junchi Yan
ICRA , 2025

Transformer based Fusion with 73.7 NDS + 10.1 FPS in nuScenes val set.

PontTuset Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving
Xiaosong Jia*, Zhenjie Yang*, Qifeng Li*, Zhiyuan Zhang*, Junchi Yan
NeurIPS Datasets and Benchmarks Track, 2024

First benchmark for multi-ability end-to-end autonomous driving.

PontTuset AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving
Xiaosong Jia, Shaoshuai Shi, Zijun Chen, Li Jiang, Wenlong Liao, Tao He, Junchi Yan
arXiv, 2024

GPT-style Motion Prediction. State-of-the-art performance on Waymo Motion.

PontTuset Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving (in CARLA-v2)
Qifeng Li*, Xiaosong Jia*, Shaobo Wang, Junchi Yan
ECCV, 2024

World model based reinforcement learning for autonomous driving. The first & only learning-based model could solve 39 complex scenearios in CARLA Leaderboard 2.0.

PontTuset ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving
Han Lu, Xiaosong Jia✉, Yichen Xie, Wenlong Liao, Xiaokang Yang, Junchi Yan✉
CVPR, 2026

Planning-oriented data selection for end-to-end autonomous driving. Training with 30% data beats full data.

PontTuset LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang*, Xiaosong Jia*, Hongyang Li, Junchi Yan
arXiv, 2023

The first survey of LLM for autonomous driving. 800+ stars. Continuous updating.

PontTuset DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li
ICCV, 2023 (Oral Presentation)

New paradigm for end-to-end autonomous driving without causal confusion.

PontTuset Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, Hongyang Li
CVPR, 2023

BEV-based scalable end-to-end autonomous drving model.

PontTuset Planning-oriented Autonomous Driving
Yihan Hu*, Jiazhi Yang*, Li Chen*, Keyu Li*, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li
CVPR, 2023 (Best Paper Award)

All modules in one Transformer-based end-to-end network for autonomous driving.

PontTuset HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding
Xiaosong Jia , Penghao Wu, Li Chen, Hongyang Li, Yu Liu, Junchi Yan
TPAMI, 2023

Unified heterogeneous graph neural network for driving scene encoding. SOTA methods on INTERACTION and Waymo challenge.

PontTuset PPGeo: Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling
Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao
ICLR, 2023

Self-supervised pretraining for policy learning

PontTuset TCP: Trajectory-guided Control Prediction for Autonomous Driving
Penghao Wu*, Xiaosong Jia* , Li Chen*, Junchi Yan, Hongyang Li, Yu Qiao
NeurIPS, 2022

Trajectory-guided control paradigm for end-to-end autonomous driving. 1st method on Carla Leaderboard, with only a monocular camera, outperforming other methods with multiple cameras and LiDAR by a large margin.

PontTuset Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach
Xiaosong Jia , Li Chen, Penghao Wu, Jia Zeng, Junchi Yan, Hongyang Li, Yu Qiao
CoRL, 2022

A plug-and-play module for trajectory prediction by enhancing the temporal correlation among the predicted time-steps.

PontTuset Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views
Xiaosong Jia , Liting Sun, Hang Zhao, Masayoshi Tomizuka, Wei Zhan
CoRL, 2021
ICCV Mair2 Workshop, 2021 (Best Student Paper Award)

Rethink the invariance property of the coordinate reprentation for trajectory prediction.

PontTuset INTERPRET: INTERACTION-Dataset-Based PREdicTion Challenge
Wei Zhan, Liting Sun, Hengbo Ma, Chenran Li, Xiaosong Jia, Masayoshi Tomizuka

Co-organized the competition in ICCV 2021. I was responsible for the design and implementation of the Joint Prediction and Conditional Prediction Tracks.

PontTuset IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data
Xiaosong Jia , Liting Sun, Masayoshi Tomizuka, Wei Zhan
RA Letters, 2021
ICRA, 2021

Unsupervisedly extracting interactive behaviors in a whether, when, and what hierarchy.


This website's source code is from Jon Barron