Vision & Graphics
Vision foundation models & generative models
Vision foundation models, video diffusion transformers, and 4D scene representations that learn the geometry, appearance, and dynamics of the visual world — from animatable portrait avatars to controllable text-to-4D generation and full 3D scene reconstruction.
- Lyra: Generative 3D Scene Reconstruction via Video Diffusion Self-Distillation
- Velox: Learning Representations of 4D Geometry and Appearance
- Efficient and Training-Free Single-Image Diffusion Models
- MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars
- Generating the Past, Present and Future from a Motion-Blurred Image (blur2vid)
- CAP4D: Animatable 4D Portrait Avatars with Multi-View Diffusion
- AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
- VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
- SG-I2V: Self-guided Trajectory Control in Image-to-Video Generation
- 4D-fy: Text-to-4D Generation via Hybrid Score Distillation
- TC4D: Trajectory-Conditioned Text-to-4D Generation