Vikram Voleti

I am a Research Scientist at Stability AI, I work on generative modeling for image, video, and 3D.

I completed my PhD in Computer Science at Mila, University of Montreal, I was supervised by Prof. Christopher Pal. My main research interests are at the intersection of computer vision and deep learning, my past research spans generative modeling and representation learning for image, video and 3D.

During my PhD, I worked as a Research Intern at Meta, Unity Labs, and Google. Prior to that, I worked in the engineering industry for 4 years.

I graduated from the Indian Institute of Technology (IIT), Kharagpur, India, in 2014 with a Dual Degree (B.Tech. (H) + M.Tech.) in Electrical Engineering, my Master’s specialization is Instrumentation and Signal Processing.

PUBLICATIONS

For a more complete list, please check my Google Scholar page.

SV4D 2.0 - Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

Chun-Han Yao, Yiming Xie, Vikram Voleti, Huaizu Jiang, Varun Jampani

Project | arXiv | Video | Stability | HuggingFace

Stable Virtual Camera - Generative View Synthesis with Diffusion Models

Jensen Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani

Project | arXiv | Video | Stability | HuggingFace

SV4D - Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani

ICLR 2025

HouseCrafter - Lifting Floorplans to 3D Scenes with 2D Diffusion Model

Hieu T Nguyen, Yiwen Chen, Vikram Voleti, Varun Jampani, Huaizu Jiang

arXiv

SV3D - Novel multi-view synthesis and 3D generation from a single image using latent video diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani

ECCV 2024 Oral!

Project | arXiv | Video | Stability | HuggingFace

(SVD) Stable video diffusion - Scaling latent video diffusion models to large datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

arXiv | Stability | HuggingFace

Multi-Resolution Continuous Normalizing Flows

Vikram Voleti, Chris Finlay, Adam Oberman, Christopher Pal

Annals of Mathematics and Artificial Intelligence, 2024

arXiv | Springer

Objaverse-XL - A Universe of 10M+ 3D Objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi

NeurIPS 2023

arXiv | Website | GitHub

Are Diffusion Models Vision-And-Language Reasoners?

Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

NeurIPS 2023

arXiv

threestudio - a modular framework for diffusion-guided 3D generation

Ying-Tian Liu, Yuan-Chen Guo, Vikram Voleti, Ruizhi Shao, Chia-Hao Chen, Guan Luo, Zixin Zou, Chen Wang, Christian Laforte, Yan-Pei Cao, Song-Hai Zhang

ICCV 2023 workshop

GitHub | PDF | pdf

PhD Thesis - Conditional Generative Modeling for Images, 3D Animations, and Video

Vikram Voleti

slides | arXiv | University of Montreal

Score-based Diffusion Models in Function Space

Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar

arXiv

(MCVD) Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal

NeurIPS 2022

arXiv | Project | Code | Poster | slides

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models

Vikram Voleti, Christopher Pal, Adam Oberman

NeurIPS 2022 Workshop on Score-Based Methods

arXiv | Poster

SMPL-IK - Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows

Vikram Voleti, Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Louis-Simon Ménard, Christopher Pal

SIGGRAPH Asia 2022

arXiv | slides | video

FairCal - Fairness Calibration for Face Verification

Tiago Salvador, Stephanie Cairns, Vikram Voleti, Noah Marshall, Adam Oberman

ICLR 2022

arXiv | poster | OpenReview

Plankton-FL - Exploration of Federated Learning for Privacy-Preserving Training of Deep Neural Networks for Phytoplankton Classification

Daniel Zhang, Vikram Voleti, Alexander Wong, Jason Deglint

CVIS 2022

arXiv

Generative Models of Brain Dynamics - A Review

Mahta Ramezanian Panahi, Germán Abrevaya, Jean-Christophe Gagnon-Audet, Vikram Voleti, Irina Rish, Guillaume Dumas

Frontiers of Artificial Intelligence

arXiv

Towards Generating Large Synthetic Phytoplankton Datasets for Efficient Monitoring of Harmful Algal Blooms

Nitpreet Bamra, Vikram Voleti, Alexander Wong, Jason Deglint

AAAI 2022 Fall Symposium

arXiv

SALT - Sea lice Adaptive Lattice Tracking--An Unsupervised Approach to Generate an Improved Ocean Model

Ju An Park, Vikram Voleti, Kathryn E Thomas, Alexander Wong, Jason L Deglint

Pre-print

arXiv

Improving Continuous Normalizing Flows using a Multi-Resolution Framework

Vikram Voleti, Chris Finlay, Adam Oberman, Christopher Pal

INNF+ @ ICML 2021

OpenReview

gradSim - Differentiable simulation for system identification and visuomotor control

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler

ICLR 2021

arXiv | website | OpenReview | pdf

Frustratingly Easy Uncertainty Estimation for Distribution Shift

Tiago Salvador, Vikram Voleti, Alexander Iannantuono, Adam Oberman

Pre-print

arXiv

Accounting for Variance in Machine Learning Benchmarks

Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

MLSys 2021

arXiv | pdf

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules