Carl Vondrick

Carl Vondrick

Research Scientist, Google
Assistant Professor, Columbia University

cvondrick at gmail dot com

About Me

I am currently a research scientist at Google Research. In Fall 2018, I will be an assistant professor in the computer science department at Columbia University.

I received my Ph.D. from MIT where I was advised by Antonio Torralba. My thesis was supported by fellowships from Google and NSF. I obtained my bachelors degree from UC Irvine advised by Deva Ramanan.

Research Overview

My research studies computer vision and machine learning. My work often capitalizes on large amounts of raw data to efficiently teach machines, for example by learning without human supervision or transferring knowledge between tasks and modalities. I am interested in learning rich models of events and scenes that generalize to new tasks and predict unseen outcomes. Other interests include interpretable models, high-level reasoning, and perception for robotics.

We are looking for Ph.D. students to join our group. Prospective students should apply to the computer science Ph.D. program and mention my name.

News

  • A paper accepted to PAMI, a paper at ICCV 2017, and a paper at CVPR 2017.
  • I will be an area chair for CVPR 2018.
  • Our sound recognition work is covered on NPR, New Scientist, and a childrens magazine!
  • Our video generation work is covered on NBC and Scientific American.
  • Two papers at NIPS 2016.
  • Our action prediction work is covered on CNN, NPR, AP, Wired and Colbert!
  • Three papers at CVPR 2016.

Representative Publications

Predictive Models

How do we capitalize on large amounts of raw data to learn to anticipate what events and actions may happen in the future? Our work is developing methods for generating videos in the future, and predicting what actions a person may perform next.

Generating Videos with Scene Dynamics
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
NIPS 2016
Paper Project Page Code NBC Scientific American New Scientist MIT News

Cross-Modal Transfer

How do we transfer knowledge between different modalities and tasks? Our research is developing large-scale models for sound recognition and learning aligned representations across images, sounds, text, sketches, and even cartoons.

Cross-Modal Scene Networks
Yusuf Aytar*, Lluis Castrejon*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
PAMI 2017
Paper Project Page

Human Activity Understanding

What are people doing in images and videos? Our work is creating models to understand what a person is looking at, or assess how well they are performing an action.

Where are they looking?
Adria Recasens*, Aditya Khosla*, Carl Vondrick, Antonio Torralba
NIPS 2015
Paper Project Page Demo

Assessing the Quality of Actions
Hamed Pirsiavash, Carl Vondrick, Antonio Torralba
ECCV 2014
Paper Project Page

Model Visualization

What do black-box computer vision models learn? What happens if you scale up datasets an order of magnitude? We are developing tools to understand and diagnose computer vision systems in order to improve them.

Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
IJCV 2016
Paper Project Page Slides MIT News

Do We Need More Training Data?
Xiangxin Zhu, Carl Vondrick, Charless C. Fowlkes, Deva Ramanan
IJCV 2015
Paper Dataset

Learning Visual Biases from Human Imagination
Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
NIPS 2015
Paper Project Page Technology Review

Video Annotation

How can we efficiently collect huge datasets for training computer vision systems? Our research develops tools and methods for creating large datasets with crowdsourcing.

Efficiently Scaling Up Crowdsourced Video Annotation
Carl Vondrick, Donald Patterson, Deva Ramanan
IJCV 2012
Paper Project Page

See all publications.