I am a Research Scientist at Stability AI, I work on generative modeling for image, video, and 3D.

I completed my PhD in Computer Science at Mila, University of Montreal, I was supervised by Prof. Christopher Pal. My main research interests are at the intersection of computer vision and deep learning, my past research spans generative modeling and representation learning for image, video and 3D.

Prior to that, I was a Research Intern at Meta, where I worked on generation of video, 3D objects, 4D content from text. Before that, I was a MITACS Research Intern at Unity Labs, I worked on 3D human pose estimation and inverse kinematics from video. In Fall 2019, I was a Research Intern at Google in the Google AI Perception team.

I graduated from the Indian Institute of Technology (IIT), Kharagpur, India, in 2014 with a Dual Degree (B.Tech. (H) + M.Tech.) in Electrical Engineering, my Master’s specialization is Instrumentation and Signal Processing.


SV4D - Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani

Project | arXiv | Video | Stability | HuggingFace

HouseCrafter - Lifting Floorplans to 3D Scenes with 2D Diffusion Model

Hieu T Nguyen, Yiwen Chen, Vikram Voleti, Varun Jampani, Huaizu Jiang


SV3D - Novel multi-view synthesis and 3D generation from a single image using latent video diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani

ECCV 2024 Oral!

Project | arXiv | Video | Stability | HuggingFace

(SVD) Stable video diffusion - Scaling latent video diffusion models to large datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

arXiv | Stability | HuggingFace

Multi-Resolution Continuous Normalizing Flows

Vikram Voleti, Chris Finlay, Adam Oberman, Christopher Pal

Annals of Mathematics and Artificial Intelligence, 2024

arXiv | Springer

Objaverse-XL - A Universe of 10M+ 3D Objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi

NeurIPS 2023

arXiv | Website | GitHub

Are Diffusion Models Vision-And-Language Reasoners?

Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

NeurIPS 2023


threestudio - a modular framework for diffusion-guided 3D generation

Ying-Tian Liu, Yuan-Chen Guo, Vikram Voleti, Ruizhi Shao, Chia-Hao Chen, Guan Luo, Zixin Zou, Chen Wang, Christian Laforte, Yan-Pei Cao, Song-Hai Zhang

ICCV 2023 workshop

GitHub | PDF | pdf

PhD Thesis - Conditional Generative Modeling for Images, 3D Animations, and Video

Vikram Voleti

slides | arXiv | University of Montreal

Score-based Diffusion Models in Function Space

Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar


(MCVD) Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal

NeurIPS 2022

arXiv | Project | Code | Poster | slides

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models

Vikram Voleti, Christopher Pal, Adam Oberman

NeurIPS 2022 Workshop on Score-Based Methods

arXiv | Poster

SMPL-IK - Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows

Vikram Voleti, Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Louis-Simon Ménard, Christopher Pal

SIGGRAPH Asia 2022

arXiv | slides | video

FairCal - Fairness Calibration for Face Verification

Tiago Salvador, Stephanie Cairns, Vikram Voleti, Noah Marshall, Adam Oberman

ICLR 2022

arXiv | poster | OpenReview

Plankton-FL - Exploration of Federated Learning for Privacy-Preserving Training of Deep Neural Networks for Phytoplankton Classification

Daniel Zhang, Vikram Voleti, Alexander Wong, Jason Deglint

CVIS 2022


Generative Models of Brain Dynamics - A Review

Mahta Ramezanian Panahi, Germán Abrevaya, Jean-Christophe Gagnon-Audet, Vikram Voleti, Irina Rish, Guillaume Dumas

Frontiers of Artificial Intelligence


Towards Generating Large Synthetic Phytoplankton Datasets for Efficient Monitoring of Harmful Algal Blooms

Nitpreet Bamra, Vikram Voleti, Alexander Wong, Jason Deglint

AAAI 2022 Fall Symposium


SALT - Sea lice Adaptive Lattice Tracking--An Unsupervised Approach to Generate an Improved Ocean Model

Ju An Park, Vikram Voleti, Kathryn E Thomas, Alexander Wong, Jason L Deglint



Improving Continuous Normalizing Flows using a Multi-Resolution Framework

Vikram Voleti, Chris Finlay, Adam Oberman, Christopher Pal

INNF+ @ ICML 2021


gradSim - Differentiable simulation for system identification and visuomotor control

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler

ICLR 2021

arXiv | website | OpenReview | pdf

Frustratingly Easy Uncertainty Estimation for Distribution Shift

Tiago Salvador, Vikram Voleti, Alexander Iannantuono, Adam Oberman



Accounting for Variance in Machine Learning Benchmarks

Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

MLSys 2021

arXiv | pdf

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

ICML 2020

arXiv | pdf

Simple Video Generation using Neural ODEs

Vikram Voleti, David Kanaa, Samira Kahou, Christopher Pal

NeurIPS 2019 Workshop on Learning with Rich Experience

arXiv | poster

Comparing Normalization in Conditional Computation Tasks

Vincent Michalski, Vikram Voleti, Samira Kahou, Anthony Ortiz, Pascal Vincent, Christopher Pal, Doina Precup

ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning

pdf | poster

Cross-Language Speech Dependent Lip-Synchronization

  • Video generation to morph lip movements of speakers to match new audio

Abhishek Jha, Vikram Voleti, Vinay P. Namboodiri, C. V. Jawahar


pdf | poster

Lip-Synchronization for Dubbed Instructional Videos

Vikram Voleti, Abhishek Jha, Vinay P. Namboodiri, C. V. Jawahar

CVPR 2018 Workshop (FIVER)

pdf | url

A Multimodal Approach for Image De-fencing and Depth Inpainting

S. Jonna, Vikram Voleti, R. R. Sahay

International Conference on Advances in Pattern Recognition, ICAPR 2015

pdf | IEEE page



October 20, 2022 Paper accepted at NeurIPS 2022 Workshop on Score-Based Methods! "Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models" arXiv
September 15, 2022 Paper accepted at SIGGRAPH Asia 2022! "SMPL-IK - Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows" arXiv
September 14, 2022 Paper accepted at NeurIPS 2022! "(MCVD) Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation" arXiv; Project page
August 29, 2022 - December 16, 2022 Research Intern at Meta, in the AI4AR team.
August 23, 2022 Presented "A tutorial on Score-based Denoising Diffusion Models" at Mila, Montreal, Canada [slides] [pdf]
August 16, 2022 Released "SMPL-IK - Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows" on arXiv
August 12, 2022 Presented "Solving Video Tasks with Denoising Diffusion Models" at Samsung AI Center, Toronto, Canada [slides] [pdf]
July 15, 2021 Review paper "Generative Models of Brain Dynamics -- A review" published at Frontiers in Artificial Intelligence! [arXiv]
May 27, 2022 Released the code for "(MCVD) Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation" on arXiv; Project page
May 23, 2022 Released "(MCVD) Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation" on arXiv; Project page
February 15, 2022 Presented "Denoising Diffusion GANs" at Mila, Montreal, Canada [slides]
January 21, 2022 Paper accepted at ICLR 2022! "FairCal - Fairness Calibration for Face Verification" [arXiv] [OpenReview]
October 18, 2021 - August 18, 2022 MITACS Research Intern at Unity Technologies
June 22, 2021 Released "Multi-Resolution Continuous Normalizing Flows" on arXiv!
June 15, 2021 Short paper accepted at INNF+ workshop @ ICML 2021! "Improving Continuous Normalizing Flows using a Multi-Resolution Framework"! OpenReview
June 7, 2021 Released "FairCal - Fairness Calibration for Face Verification" on arXiv!
June 7, 2021 Released "Frustratingly Easy Uncertainty Estimation for Distribution Shift" on arXiv!
May 19, 2021 Awarded as an "Outstanding Reviewer" at CVPR 2021!
April 7, 2021 Released gradSim on arXiv! Visit our wesite - gradsim.github.io! Released video for gradSim!
April 6, 2021 Presented "Training GANs by Solving ODEs" at Mila, Montreal, Canada [slides]
February 15, 2021 Presented "Score-based Generative Models using Neural SDEs" at Mila, Montreal, Canada [slides]
January 18, 2021 Paper accepted at MLSys 2021! "Accounting for Variance in Machine Learning Benchmarks" [MLSys 2021 Proceedings]
January 13, 2021 Paper accepted at ICLR 2021! "gradSim - Differentiable simulation for system identification and visuomotor control" [OpenReview]
December 21, 2020 Awarded the Microsoft Diversity Award for Doctoral Research!
December 10-11, 2020 Organized GRAPHQUON 2020 (formerly MOTOGRAPH)
December 6-12, 2020 Attended NeurIPS 2020 (virtually)
October 1, 2020 Started working as an AI Advisor to Blue Lion Labs
September 1, 2020 Presented "Continuous Normalizing Flows" at Mila, Montreal, Canada [slides]
September 11, 2020 Short paper "Introducing GradSim - Differentiable Simulation for Self-Supervised Parameter Estimation from Video" accepted at the Montreal AI Symposium 2020, Canada    [pdf]
August 28, 2020 Passed my PhD qualifiers! Onto the thesis stage!
July 25, 2020 Presented "GANs - the story so far" at the Summer Symposium on AI Research, India [slides] [video]
July 10, 2020 Presented "A brief tutorial on Neural ODEs" at Mila, Montreal, Canada [slides] [video]
June 14-19, 2020 Attended CVPR 2020 (virtually)
June 2, 2020 Paper accepted at ICML 2020! Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio, "Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules" [arxiv]
April 26 - May 1, 2020 Attended ICLR 2020 (virtually)
April 2, 2020 Gave a tutorial on the "Mathematics of Neural ODEs" at the University of Guelph, Canada [slides]
March 24, 2020 - September 30, 2020 AI Scientist in Residence at NextAI, Toronto, Canada; consultant for multiple startups
January 7, 2020 Gave a talk on "Simple Video Generation using Neural ODEs" at IIIT Hyderabad, India    [pdf]
December 15, 2019 Joined Prof. Graham Taylor's lab as a Visiting Researcher at the University of Guelph, Canada.
December 9-14, 2019 Attended NeurIPS 2019, Vancouver, Canada; presented "Simple Video Generation using Neural ODEs" in the Workshop on Learning with Rich Experience! [paper, poster]
December 9, 2019 Attended AICan 2019, the second annual meeting of the Pan-Canadian AI Strategy; presented "Simple Video Generation using Neural ODEs" [paper]
September 16, 2019 - December 6, 2019 Research Intern at Google, Mountain View in the Google AI Perception team, with Bryan Seybold and Sourish Chaudhuri. Worked on Semi-supervised Active Speaker Detection in videos.
September 9-13, 2019 Teaching Assistant at the 4th IVADO / Mila Deep Learning School, Montreal, Canada
September, 2019 Teaching Assistant for the Fundamentals of Machine Learning (IFT 6390) course by Ioannis Mitliagkas at the University of Montreal, Montreal, Canada
July 7-10, 2019 Attended RLDM 2019, Montreal, Canada
June 16-20, 2019 Attended CVPR 2019, Long Beach, California
June 9-15, 2019 Attended ICML 2019, Long Beach, California; presented "Comparing normalization in conditional computation tasks" in the Workshop on Understanding and Improving Generalization in Deep Learning! [paper, poster]
May 31, 2019 Paper accepted at ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning! Paper title - "Comparing normalization in conditional computation tasks" [pdf]
May 30, 2019 Gave a tutorial on "GANs" at the AI for Social Good Summer Lab (AI4Good)! Followed these excellent slides (not mine) - [pdf]
May 12-17, 2019 Attended ICASSP 2019, Brighton, UK; presented "Cross-Language Speech Dependent Lip-Synchronization"; [IEEE, paper, poster]
April 2019 - Aug 2019 Worked as an AI Scientist in Residence at NextAI, Montreal, Canada; consulted for 6 startups at NextAI
February 01, 2019 Paper accepted at ICASSP 2019! "Cross-Language Speech Dependent Lip-Synchronization" - Abhishek Jha, Vikram Voleti, Vinay P. Namboodiri, C. V. Jawahar    [pdf]
January 16, 2019 Released code of Self-Attention GAN in PyTorch on GitHub [url]
December 02-08, 2018 Attended NeurIPS 2018, Montreal, Canada
October 30, 2018 Presented "(BigGAN) Large Scale GAN Training for High Fidelity Natural Image Synthesis" at Mila, University of Montreal, Canada    [pdf]
September 24, 2018 Presented "Visualizing the Loss Landscapes of Neural Nets" at Mila, University of Montreal, Canada    [pdf]
September 4, 2018 Joined Mila as a PhD student at the Department of Computer Science and Operations Research, University of Montreal [url]
June 6, 2018 Presented "Linear Algebra - Groups, Vector Spaces, Matrix Transformations" at CVIT, IIIT Hyderabad, India, as part of the Linear Algebra course by Lovish Chum    [pdf]
May 25, 2018 Short paper accepted at CVPR 2018 Workshop (FIVER)! "Lip-Synchronization for Dubbed Instructional Videos"    [pdf] [url]
April 26, 2018 Presented "Lipreading in the Wild" at CVIT, IIIT Hyderabad, India    [pdf]
February 26, 2018 Presented "Image de-fencing using RGB-D data" at Max Planck Institute for Informatics, Saarbrücken, Germany    [pdf]
February 18, 2018 Presented "Mathematics in Sanskrit Poetry" at CVIT, IIIT Hyderabad, India    [pdf]
February 2, 2018 Presented "Intuition behind LSTMs" at CVIT, IIIT Hyderabad, India    [pdf]
January 10, 2018 Presented "A Multimodal Approach for Image De-fencing and Depth Inpainting" at CVIT, IIIT Hyderabad, India [url]
January 5, 2017 Started work as "Mentor" for Foundations of Artificial Intelligence and Machine Learning by TalentSprint
December 29-31, 2017 Attended conference ICIDE 2017, Dubai; presented "Simple Real-Time Pattern Recognition for Industrial Automation"    [pdf]
November 11-12, 2017 Won the SMS Classification challenge, participated in the Video Action Recognition challenge at Hack2Innovate hackathon in Bangalore, India
August 9, 2017 Presented "Mathematics behind Back-propagation" at CVIT, IIIT Hyderabad, India [url]
July 10-15, 2017 Attended IIIT Hyderabad's Summer School on Machine Learning; ranked 4th among all participants [notes]
July 3-8, 2017 Attended IIIT Hyderabad's Summer School on Computer Vision; ranked 3rd among all participants [notes]
May 15, 2017 Joined CVIT, IIIT Hyderabad, India as a researcher under Prof. C. V. Jawahar
April 9, 2017 Presented "Mathematics behind Back-propagation" at GreyOrange Robotics, India [url]