Carl Vondrick

Carl Vondrick

Research Scientist, Google
Assistant Professor, Columbia University

cvondrick@gmail.com

Research Overview

My research studies computer vision and machine learning. My work often capitalizes on large amounts of raw data to efficiently teach machines, for example by learning without human supervision or transferring knowledge between tasks and modalities. I am interested in learning rich models of events and scenes that generalize to new tasks and predict unseen outcomes. Other interests include interpretable models, high-level reasoning, and perception for robotics.

We are looking for Ph.D. students to join our group. Prospective students should apply to the computer science Ph.D. program and mention my name.

About Me

I am currently a research scientist at Google Research. In Fall 2018, I will be an assistant professor in the computer science department at Columbia University.

I received my Ph.D. from MIT where I was advised by Antonio Torralba. My thesis was supported by fellowships from Google and NSF. I obtained my bachelors degree from UC Irvine advised by Deva Ramanan.

News

  • A paper accepted to PAMI, a paper at ICCV 2017, and a paper at CVPR 2017.
  • I will be an area chair for CVPR 2018.
  • Our sound recognition work is covered on NPR, New Scientist, and a childrens magazine!
  • Our video generation work is covered on NBC and Scientific American.
  • Two papers at NIPS 2016.
  • Our action prediction work is covered on CNN, NPR, AP, Wired and Colbert!
  • Three papers at CVPR 2016.

Papers and Projects

Future Prediction

How do we capitalize on large amounts of raw data to learn to anticipate what events and actions may happen in the future? Our work is developing methods for generating videos in the future, and predicting what actions a person may perform next.

Generating the Future with Adversarial Transformers
Carl Vondrick, Antonio Torralba
CVPR 2017
Paper Project Page

Generating Videos with Scene Dynamics
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
NIPS 2016
Paper Project Page Code NBC Scientific American New Scientist MIT News

Predicting Motivations of Actions by Leveraging Text
Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba
CVPR 2016
Paper Dataset

Cross-Modal Transfer

How do we transfer knowledge between different modalities and tasks? Our research is developing large-scale models for sound recognition and learning aligned representations across images, sounds, text, sketches, and even cartoons.

Cross-Modal Scene Networks
Yusuf Aytar*, Lluis Castrejon*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
PAMI 2017
Paper Project Page

See, Hear, and Read: Deep Aligned Representations
Yusuf Aytar, Carl Vondrick, Antonio Torralba
arXiv
Paper Project Page

Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Lluis Castrejon*, Yusuf Aytar*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
CVPR 2016
Paper Project Page Demo

Human Activity Understanding

What are people doing in images and videos? Our work is creating models to understand what a person is looking at, or assess how well they are performing an action.

Following Gaze in Video
Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
ICCV 2017
Paper

Who is Mistaken?
Benjamin Eysenbach, Carl Vondrick, Antonio Torralba
arXiv
Paper Project Page

Where are they looking?
Adria Recasens*, Aditya Khosla*, Carl Vondrick, Antonio Torralba
NIPS 2015
Paper Project Page Demo

Assessing the Quality of Actions
Hamed Pirsiavash, Carl Vondrick, Antonio Torralba
ECCV 2014
Paper Project Page

Model Visualization

What do black-box computer vision models learn? What happens if you scale up datasets an order of magnitude? We are developing tools to understand and diagnose computer vision systems in order to improve them.

Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
IJCV 2016
Paper Project Page Slides MIT News

Do We Need More Training Data?
Xiangxin Zhu, Carl Vondrick, Charless C. Fowlkes, Deva Ramanan
IJCV 2015
Paper Dataset

Learning Visual Biases from Human Imagination
Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
NIPS 2015
Paper Project Page Technology Review

HOGgles: Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
ICCV 2013
Paper Project Page Slides MIT News

Do We Need More Training Data or Better Models for Object Detection?
Xiangxin Zhu, Carl Vondrick, Deva Ramanan, Charless C. Fowlkes
BMVC 2012
Paper Dataset

Video Annotation

How can we efficiently collect huge datasets for training computer vision systems? Our research develops tools and methods for creating large datasets with crowdsourcing.

Efficiently Scaling Up Crowdsourced Video Annotation
Carl Vondrick, Donald Patterson, Deva Ramanan
IJCV 2012
Paper Project Page

Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces
Carl Vondrick, Deva Ramanan, Donald Patterson
ECCV 2010
Paper Project Page