VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering
ECCV 2024
Tech Lead Amazon AI Studio | Ex-Tech Lead LLM for Amazon Product Search | Ex-Tech Lead Amazon Video Search, Amazon/A9 | Xoogler
I currently drive the Amazon AI Studio initiative while continuing to advance multi-modal LLMs and video generation research. Previously I led Amazon Video Search and the LLM reranker for Amazon Product Search, and my broader interests span multi-modality perception, 3D computer vision, and physically based simulation.
Before Amazon, I was a Senior Research SDE at Google Research focusing on multi-modal content creation and 3D vision. I completed my PhD at UNC-CH under Prof. Ming C. Lin.
ECCV 2024
NIPS 2023 Workshop SSLTheoryPractice
CVPR 2021 (CV4Animal Workshop)
MICCAI 2016
CARS 2012
Supporting multimodal dance generation
Developer tools and loaders for the AIST++ dataset
Educational implementation of diffusion models
Interactive creation and editing app