CV Datasets on the web

Participate in Reproducible Research

Detection

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

LabelMe is a web-based image annotation tool that allows researchers to label images and share the annotations with the rest of the community. If you use the database, we only ask that you contribute to it, from time to time, by using the labeling tool.

BioID Face Detection Database

1521 images with human faces, recorded under natural conditions, i.e. varying illumination and complex background. The eye positions have been set manually.

CMU/VASC & PIE Face dataset

Yale Face dataset

Caltech

Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds

Caltech 101

Pictures of objects belonging to 101 categories

Caltech 256

Pictures of objects belonging to 256 categories

Daimler Pedestrian Detection Benchmark

15,560 pedestrian and non-pedestrian samples (image cut-outs) and 6744 additional full images not containing pedestrians for bootstrapping. The test set contains more than 21,790 images with 56,492 pedestrian labels (fully visible or partially occluded), captured from a vehicle in urban traffic.

MIT Pedestrian dataset

CVC Pedestrian Datasets

CBCL Pedestrian Database

MIT Face dataset

CBCL Face Database

MIT Car dataset

CBCL Car Database

MIT Street dataset

CBCL Street Database

INRIA Person Data Set

A large set of marked up images of standing or walking people

INRIA car dataset

A set of car and non-car images taken in a parking lot nearby INRIA

INRIA horse dataset

A set of horse and non-horse images

H3D Dataset

3D skeletons and segmented regions for 1000 people in images

Classification

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

Caltech

Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds

Caltech 101

Pictures of objects belonging to 101 categories

Caltech 256

Pictures of objects belonging to 256 categories

ETHZ Shape Classes

A dataset for testing object class detection algorithms. It contains 255 test images and features five diverse shape-based classes (apple logos, bottles, giraffes, mugs, and swans).

Flower classification data sets

17 Flower Category Dataset

Animals with attributes

A dataset for Attribute Based Classification. It consists of 30475 images of 50 animals classes with six pre-extracted feature representations for each image.

Recognition

Face and Gesture Recognition Working Group FGnet

Feret

Face and Gesture Recognition Working Group FGnet

PUT face

9971 images of 100 people

Labeled Faces in the Wild

A database of face photographs designed for studying the problem of unconstrained face recognition

Urban scene recognition

Traffic Lights Recognition, Lara's public benchmarks.

PubFig: Public Figures Face Database

The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects.

YouTube Faces

The data set contains 3,425 videos of 1,595 different people. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames

Tracking

BIWI Walking Pedestrians dataset

Walking pedestrians in busy scenarios from a bird eye view

"Central" Pedestrian Crossing Sequences

Three pedestrian crossing sequences

Pedestrian Mobile Scene Analysis

The set was recorded in Zurich, using a pair of cameras mounted on a mobile platform. It contains 12'298 annotated pedestrians in roughly 2'000 frames.

Head tracking

BMP image sequences.

Datasets for Tracking People in Aerial Image Sequences

Each dataset comprises of an aerial image sequence and a xml file with manually labeled trajectories of all visible persons.

MIT Traffic Data Set

MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is recorded by a stationary camera.

Segmentation

Image Segmentation with A Bounding Box Prior dataset

Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

Motion Segmentation and OBJCUT data

Cows for object segmentation, Five video sequences for motion segmentation

Geometric Context Dataset

Geometric Context Dataset: pixel labels for seven geometric classes for 300 images

Crowd Segmentation Dataset

This dataset contains videos of crowds and other high density moving objects. The videos are collected mainly from the BBC Motion Gallery and Getty Images website. The videos are shared only for the research purposes. Please consult the terms and conditions of use of these videos from the respective websites.

CMU-Cornell iCoseg Dataset

Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. Approximately 17 images per group, 643 images total.

Segmentation evaluation database

200 gray level images along with ground truth segmentations

The Berkeley Segmentation Dataset and Benchmark

Image segmentation and boundary detection. Grayscale and color segmentations for 300 images, the images are divided into a training set of 200 images, and a test set of 100 images.

Weizmann horses

328 side-view color images of horses that were manually segmented. The images were randomly collected from the WWW.

Saliency-based video segmentation with sequentially updated priors

10 videos as inputs, and segmented image sequences as ground-truth

Foreground/Background

Wallflower Dataset

For evaluating background modelling algorithms

Foreground/Background Microsoft Cambridge Dataset

Foreground/Background segmentation and Stereo dataset from Microsoft Cambridge

Saliency Detection (source)

AIM

120 Images / 20 Observers (Neil D. B. Bruce and John K. Tsotsos 2005).

LeMeur

27 Images / 40 Observers (O. Le Meur, P. Le Callet, D. Barba and D. Thoreau 2006).

Kootstra

100 Images / 31 Observers (Kootstra, G., Nederveen, A. and de Boer, B. 2008).

DOVES

101 Images / 29 Observers (van der Linde, I., Rajashekar, U., Bovik, A.C., Cormack, L.K. 2009).

Ehinger

912 Images / 14 Observers (Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba and Aude Oliva 2009).

NUSEF

758 Images / 75 Observers (R. Subramanian, H. Katti, N. Sebe1, M. Kankanhalli and T-S. Chua 2010).

JianLi

235 Images / 19 Observers (Jian Li, Martin D. Levine, Xiangjing An and Hangen He 2011).

Video Surveillance

CAVIAR

For the CAVIAR project a number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, entering and exitting shops, fighting and passing out and last, but not least, leaving a package in a public place.

ViSOR

ViSOR contains a large set of multimedia data and the corresponding annotations.

Multiview

3D Photography Dataset

Multiview stereo data sets: a set of images

Multi-view Visual Geometry group's data set

Dinosaur, Model House, Corridor, Aerial views, Valbonne Church, Raglan Castle, Kapel sequence

Oxford reconstruction data set (building reconstruction)

Oxford colleges

Multi-View Stereo dataset (Vision Middlebury)

Temple, Dino

Multi-View Stereo for Community Photo Collections

Venus de Milo, Duomo in Pisa, Notre Dame de Paris

IS-3D Data

Dataset provided by Center for Machine Perception

CVLab dataset

CVLab dense multi-view stereo image database

3D Objects on Turntable

Objects viewed from 144 calibrated viewpoints under 3 different lighting conditions

Object Recognition in Probabilistic 3D Scenes

Images from 19 sites collected from a helicopter flying around Providence, RI. USA. The imagery contains approximately a full circle around each site.

Action

UCF Sports Action Dataset

This dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery, and GettyImages.

UCF Aerial Action Dataset

This dataset features video sequences that were obtained using a R/C-controlled blimp equipped with an HD camera mounted on a gimbal.The collection represents a diverse pool of actions featured at different heights and aerial viewpoints. Multiple instances of each action were recorded at different flying altitudes which ranged from 400-450 feet and were performed by different actors.

UCF YouTube Action Dataset

It contains 11 action categories collected from YouTube.

Weizmann action recognition

Walk, Run, Jump, Gallop sideways, Bend, One-hand wave, Two-hands wave, Jump in place, Jumping Jack, Skip.

UCF50

UCF50 is an action recognition dataset with 50 action categories, consisting of realistic videos taken from YouTube.

ASLAN

The Action Similarity Labeling (ASLAN) Challenge.

Human pose

ETHZ CALVIN Dataset

Image stitching

IPM Vision Group Image Stitching datasets