Shan Yang

Shan Yang

Researcher · Multi-modal Reasoning

I work on multi-modal reasoning: making vision-language models reason about physics, geometry, and the world they actually see. My focus is on the unsexy half of the stack — audited training data, hard evaluation benchmarks, and RL recipes that survive the jump from text-only chain-of-thought to multi-modal.

Currently Staff Applied Scientist at Adobe Foundry (Adobe's foundation-model research org), post-training foundation models with SFT, DPO, and RL. Previously Tech Lead at Amazon (GenAI Live Action Studio, Amazon Video Search) and Senior Research SDE at Google Research (multi-modal modeling, AIST++). PhD from UNC-Chapel Hill with Prof. Ming C. Lin on learning physical parameters from video.

Currently exploring: multi-modal GRPO and chain-of-thought recipes for VLMs (Physics-o1, in preparation).

Latest
Physics-R1

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

Visual physics reasoning for vision-language models: a 2,434-record audited training corpus, a 500-question novel-source olympiad benchmark (PhysOlym-A), and an RL recipe that pushes SOTA on visual physics reasoning at the 7B scale. All artifacts open.

Selected Publications

TTC-Net

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René Vidal

ICML 2026

VLAP

VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering

Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang

ECCV 2024

MBT

Attention Bottlenecks for Multimodal Fusion

Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

NeurIPS 2021

AI Choreographer

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++

Shan Yang*, Ruilong Li*, David A. Ross, Angjoo Kanazawa

ICCV 2021

ICAR

ICAR: Image-based Complementary Auto Reasoning

Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

AAAI 2024

Show all publications
Cloth Material Recovery

Learning-based Cloth Material Recovery from Video

Shan Yang, Junbang Liang, Ming C. Lin

ICCV 2017

MeSa

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

NeurIPS 2023 Workshop SSLTheoryPractice

Schema Perception

Schema Perception for Robust Video Question Answering

Xijun Wang, Shan Yang

Adobe Technical Report 2025

RoSI

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

arXiv 2023

Optical Mouse

Optical Mouse: 3D Mouse Pose From Single-View Video

Shan Yang*, Bo Hu*, David A. Ross, Avneesh Sud, Yi Liu, Graham Ruby, Bryan Seybold

CVPR 2021 (CV4Animal Workshop)

Garment Recovery

Physics-Inspired Garment Recovery from a Single-View Image

Shan Yang, Zherong Pan, Tanya Amert, Ke Wang, Licheng Yu, Tamara Berg, Ming C. Lin

ACM TOG 2018

Referring Expressions

Modeling Context in Referring Expressions

Licheng Yu, Patric Poirson, Shan Yang, Alex Berg, Tamara Berg

ECCV 2016

Prostate Cancer Classification

Classification of Prostate Cancer Grades and T-Stages based on Tissue Elasticity Using Medical Image Analysis

Shan Yang, Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu, Ming C. Lin

MICCAI 2016

Bayesian Estimation

Bayesian Estimation of Non-Rigid Mechanical Parameters Using Temporal Sequences of Deformation Samples

Shan Yang, Ming C. Lin

ICRA 2016

MaterialCloning

MaterialCloning: Acquiring Elasticity Parameters from Images for Medical Applications

Shan Yang, Ming C. Lin

IEEE TVCG 2016

Simultaneous Estimation

Simultaneous Estimation of Elasticity for Multiple Deformable Bodies

Shan Yang, Ming C. Lin

Computer Animation and Virtual Worlds 2015

Buried Suture

Real-time Simulation for Buried Suture

Shan Yang, Wenlong Lu, Lixu Gu

CARS 2012

Open Source & Releases

Dataset · Hugging Face
Physics-R1 Corpus

2,434-record audited training corpus for visual physics reasoning, with provenance and license audit.

Benchmark · Hugging Face
PhysOlym-A

500-question novel-source olympiad benchmark for evaluating visual physics reasoning in VLMs.

Dataset · Hugging Face
PhysDojo Annotations

11.6K episodes of physics-grounded video annotations for world-model training.

Code · GitHub
physics-r1-code

Training, evaluation, and reward code for Physics-R1.

Code · GitHub
AIST++ Dataset API

Loaders and tooling for the AIST++ 3D dance dataset (from AI Choreographer, ICCV 2021).

Project
Lumi Research Manager

AI-powered research project manager with pixel-art agent teams (Scout, Theorist, Architect, Coder).

Notes · GitHub
RL Learning Log

Personal log of learning reinforcement learning — notes, experiments, and insights.