wiki:CP6ImageLabeling
Last modified 3 months ago Last modified on 03/27/17 09:29:41

Modified Mon Mar 27 09:29:41 2017 by Eric.Woldridge.

Challenge Problem #6: Image Labeling

Motivation

Multimedia data such as video and pictures are being produced and shared at an unprecedented and accelerating pace in recent years. For example, on YouTube, video data is currently being uploaded at the rate of approximately 30 million hours a year. This drives a strong need to develop automated tools to help users understand, organize, and retrieve images and videos from very large collections.

The goal of the proposed challenge problem is to assign labels (e.g. "tree", "people", or "baby") to images based on relationships found in a rich social multimedia database containing visual features and metadata such as user information (e.g. username, location, network of contacts), comments, user image gallery, uploader defined groups, and links between shared content. When all this information is used collectively in a suitable fashion, it may be able to advance the state-of-the-art in image labeling.

Problem Overview

The problem of multimedia retrieval is to develop the scientific methodology to understand and discover images/videos with particular content from a complex, large, and growing collection of multimedia. Real-world multimedia, especially as shared on the Internet, can be challenging to retrieve using only visual information, due to complex content, partial occlusion, and diverse styles and quality. The most common solution to this problem is to annotate media with keywords that describe the content and then perform keyword search against these annotations. The problem of annotating images consists of inferring content labels, L, conditioned on an image, I, and other related metadata information, M, e.g. P(L|I,M).

In Challenge Problem 6 (CP6), we will solve the task of automatic image annotation or labeling by exploiting the metadata, M, in addition to the visual information, I. Some types of metadata (i.e. EXIF tags) are generated by the camera when the image is taken; others (i.e. user-provided tags, comments from viewers) are generated after the image is uploaded to an image-sharing service such as Flickr. We will use a subset of the MIRFLICKR [Huiskes2008] dataset to supply the ground-truth image labels, image features, and related metadata. The MIRFLICKR data is available under Creative Commons licenses. The subset is the one used by [McAuley2012], which we will henceforth refer to as the MIR14k dataset.

Within the PPAML taxonomy of challenge problems, this CP is related to the Intelligence Analysis domain; the data structures are a hybrid of discrete (categorical) and continuous (features and feature distances) presented in both relational and vector forms. The basic parametric probabilistic model is an undirected graphical model over a fixed model structure with latent variables. Queries are formulated as marginal maximum a posteriori MAP for individual images, or joint MAP for the entire graph. The query timing is one-shot with slow tempo and stationary parameters.

Evaluation Timeline

  • Round 1: Presented July 2015, evaluated January 2016:
    • Images which do not contain the tag “structure”: 5216 images for training, 2252 for testing
  • Round 2: Presented January 2016, evaluated July 2016:
    • Round 1 data plus images tagged “structure”: 9182 total for training, 3707 for testing

Attachments