Chang E.Y. Foundations of Large-Scale Multimedia Information Management and Retrieval

Файл формата pdf
размером 6,86 МБ

Добавлен пользователем Илья w495 Никитин 29.11.2014 23:15
Описание отредактировано 15.04.2016 05:21

Chang E.Y. Foundations of Large-Scale Multimedia Information Management and Retrieval

Springer, 2011. — 291 p. — ISBN10: 3642204287, ISBN13: 978-3642204289

The volume and accessibility of images and videos is increasing exponentially, thanks to the sea-change of imagery captured from film to digital form, to the availability of electronic networking, and to the ubiquity of high-speed network access. The tools for organizing and retrieving these multimedia data, however, are still quite primitive. One such evidence is the lack of effective tools to-date for organizing personal images or videos. Another clue is that all Internet search engines today still rely on the keyword search paradigm, which knowingly suffers from the semantic aliasing problem. Existing organization and retrieval tools are ineffective partly because they fail to properly model and combine «content» and «context» of multimedia data, and partly because they fail to effectively address the scalability issues. For instance, today, a typical content-based retrieval prototype extracts some signals from multimedia data instances to represent them, employs a poorly justified distance function to measure similarity between data instances, and relies on a costly sequential scan to find data instances similar to a query instance. From feature extraction, data representation, multimodal fusion, similarity measurement, feature-to-semantic mapping, to indexing, the design of each component has mostly not been built on solid scientific foundations. Furthermore, most prior art focuses on improving one single component, and demonstrates its effectiveness on small datasets. However, the problem of multimedia information management and retrieval is inherently an interdisciplinary one, and tackling the problem must involve synergistic collaboration between fields of machine learning, multimedia computing, cognitive science, and large-scale computing, in addition to signal processing, computer vision, and databases.

This book presents an interdisciplinary approach to first establish scientific foundations for each component, and then address interactions between components in a scalable manner in terms of both data dimensionality and volume. This book is organized into 12 chapters of two parts. The first part of the book depicts a multimedia system’s key components, which together aims to comprehend semantics of multimedia data instances. The second part presents methods for scaling up these components for high-dimensional data and very large datasets. In part one we start with providing an overview of the research and engineering challenges in Chap. 1 . Chap. 2 presents feature extraction, which obtains useful signals from multimedia data instances. We discuss both model-based and data-driven, and then a hybrid approach. In Chap. 3, we deal with the problem of formulating users’ query concepts, which can be complex and subjective. We show how active learning and kernel methods can be used to work effectively with both keywords and perceptual features to understand a user’s query concept with minimal user feedback. We argue that only after a user’s query concept can be thoroughly comprehended, it is then possible to retrieve matching objects. Chaps. 4 and 5 address the problem of distance-function formulation, a core subroutine of information retrieval for measuring similarity between data instances. Chap. 4 presents Dynamic Partial function and its foundation in cognitive psychology. Chap. 5 shows how an effective function can also be learned from examples in a data-driven way. Chaps. 6–8 describe methods that fuse metadata of multiple modalities. Multimodal fusion is important to properly integrate perceptual features of various kinds (e.g., color, texture, shape; global, local; time-invariant, time-variant), and to properly combine metadata from multiple sources (e.g., from both content and context). We present three techniques: superkernel fusion in Chap. 6, fusion with causal strengths in Chap. 7, and combinational collaborative filtering in Chap. 8.

Part two of the book tackles various scalability issues. Chap. 9 presents the problem of imbalanced data learning where the number of data instances in the target class is significantly out-numbered by the other classes. This challenge is typical in information retrieval, since the information relevant to our queries is always the minority in the dataset. The chapter describes algorithms to deal with the problem in vector and non-vector spaces, respectively. Chaps. 10 and 11 address the scalability issues of kernel methods. Kernel methods are a core machine learning technique with strong theoretical foundations and excellent empirical successes. One major shortcoming of kernel methods is its cubic computation time required for training and linear for classification. We present parallel algorithms to speed up the training time, and fast indexing structures to speed up the classification time. Finally, in Chap. 12, we present our effort in speeding up Latent Dirichlet Allocation (LDA), a robust method for modeling texts and images. Using distributed computing primitives, together with data placement and pipeline techniques, we were able to speed up LDA 1,500 times when using 2,000 machines.

Although the target application of this book is multimedia information retrieval, the developed theories and algorithms are applicable to analyze data of other domains, such as text documents, biological data and motion patterns. This book is designed for researchers and practitioners in the fields of multimedia, computer vision, machine learning, and large-scale data mining. We expect the reader to have some basic knowledge in Statistics and Algorithms. We recommend that the first part (Chaps. 1–8) to be used in an upper-division undergraduate course, and the second part (Chaps. 9–12) in a graduate-level course. Chaps. 1–6 should be read sequentially. The reader can read Chaps. 7–12 in selected order. Appendix lists our open source sites.

Palo Alto, February 2011 Edward Y. Chang

Key Subroutines of Multimedia
Data Management
Overview
Feature Extraction
Similarity
Learning
Multimodal Fusion
Indexing
Scalability
Concluding Remarks
Perceptual Feature Extraction
DMD Algorithm
Model-Based Pipeline
Data-Driven Pipeline
Experiments
Dataset and Setup
Model-Based versus Data-Driven
DMD versus Individual Models
Regularization Tuning
Tough Categories
Related Reading
Concluding Remarks
Query Concept Learning
Support Vector Machines and Version Space
Active Learning and Batch Sampling Strategies
Theoretical Foundation
Sampling Strategies
Concept-Dependent Learning
Concept Complexity
Limitations of Active Learning
Concept-Dependent Active Learning Algorithms
Experiments and Discussion
Testbed and Setup
Active Versus Passive Learning
Against Traditional Relevance Feedback Schemes
Sampling Method Evaluation
Concept-Dependent Learning
Concept Diversity Evaluation
Evaluation Summary
Related Readings
Machine Learning
Relevance Feedback
Relation to Other Chapters
Similarity
Mining Image Feature Set
Image Testbed Setup
Feature Extraction
Feature Selection
Discovering the Dynamic Partial Distance Function
Minkowski Metric and its Limitations
Dynamic Partial Distance Function
Psychological Interpretation of Dynamic Partial
Distance Function
Empirical Study
Image Retrieval
Video Shot-Transition Detection
Near Duplicated Articles
Weighted DPF Versus Weighted Euclidean
Observations
Related Reading
Concluding Remarks
Formulating Distance Functions
Illustrative Examples
DFA Algorithm
Transformation Model
Distance Metric Learning
Experimental Evaluation
Evaluation on Contextual Information
Evaluation on Effectiveness
Observation
Related Reading
Metric Learning
Kernel Learning
Concluding Remarks
Multimodal Fusion
Related Reading
Modality Identification
Modality Fusion
Independent Modality Analysis
PCA
ICA
IMG
Super-Kernel Fusion
Experiments
Evaluation of Modality Analysis
Evaluation of Multimodal Kernel Fusion
Observations
Concluding Remarks
Fusing
Content and Context with Causality
Photo Annotation
Probabilistic Graphical Models
Multimodal Metadata
Contextual Information
Perceptual Content
Semantic Ontology
Influence Diagrams
Structure Learning
Causal Strength
Case Study
Dealing with Missing Attributes
Experiments
Experiment on Learning Structure
Experiment on Causal Strength Inference
Experiment on Semantic Fusion
Experiment on Missing Features
Concluding Remarks
Combinational Collaborative Filtering, Considering
Personalization
Related Reading
Combinational Collaborative Filtering
C–U and C–D Baseline Models
CCF Model
Gibbs and EM Hybrid Training
Parallelization
Inference
Experiments
Gibbs + EM Versus EM
The Orkut Dataset
Runtime Speedup
Concluding Remarks
Imbalanced Data Learning
Related Reading
Kernel Boundary Alignment
Conformally Transforming Kernel K
Modifying Kernel Matrix K
Experimental Results
Vector-Space Evaluation
Non-Vector-Space Evaluation
Concluding Remarks
PSVM: Parallelizing Support Vector Machines on Distributed Computers
Interior Point Method With Incomplete Cholesky
Factorization
PSVM Algorithm
Parallel ICF
Parallel IPM
Computing Parameter b and Writing Back
Experiments
Class-Prediction Accuracy
Scalability
Overheads
Concluding Remarks
Approximate High-Dimensional Indexing with Kernel
Related Reading
Algorithm SphereDex
Create: Building the Index
Search: Querying the Index
Update: Insertion and Deletion
Experiments
Setup
Performance with Disk IOs
Choice of Parameter g
Impact of Insertions
Sequential Versus Random
Percentage of Data Processed

Concluding Remarks
Range Queries
Farthest Neighbor Queries
Speeding Up Latent Dirichlet Allocation with Parallelization and Pipeline Strategies
Related Reading
LDA Performance Enhancement
Approximate Distributed LDA
Parallel Gibbs Sampling and AllReduce
MPI Implementation of AD-LDA
PLDA+
Reduce Bottleneck of AD-LDA
Framework of PLDA+
Algorithm for Pw Processors
Algorithm for Pd Processors
Straggler Handling
Parameters and Complexity
Experimental Results
Datasets and Experiment Environment
Perplexity
Speedups and Scalability
Large-Scale Applications
Mining Social-Network User Latent Behavior
Question Labeling
Concluding Remarks
Appendix: Open Source Software

Author: Edward Y. Chang Ph.D., Director of Research Google Inc.
e-mail: edchang@Google.com http://infolab.stanford.edu/*echang/