Authors
Jan Rupnik,
John Shawe-Taylor,
Publication date
2010
Publisher
Total citations
Description
Matrix factorization represents a popular approach in pattern analysis and is used to tackle many problems, such as: collaborative filtering, imputing missing data, denoising data, dimensionality reduction, data visualization and exploratory analysis. This thesis is focused on factorization based pattern analysis methods for multiview learning problems: that is problems where each data instance is represented by multiple views of an underlying object, encoded by multiple feature sets. As an example of a multiview problem consider a dataset where each instance has two representations: a visual image and a textual description. The patterns of interest are pairs of functions over images and texts that are strongly related over the observed data. Canonical Correlation Analysis (CCA) is designed to extract patterns from data sets with two views. This thesis focuses on two generalizations of CCA, which were proposed in the literature: Sum of Correlations (SUMCOR) and Sum of Squared Correlations SSCOR. The SUMCOR problem formulation is interesting from the optimization perspective by its own right, since it emerges in other problems as well. We study several aspects of the generalizations. We first present a provably convergent novel algorithm for finding non-linear higher order patterns, which is based on an iterative approach for solving multivariate eigenvalue problems. We show that SUMCOR in general is NP-hard and then study its reformulation to a computationally tractable Semidefinite Programming (SDP) problem. Based on the reformulation we derive several computationally feasible bounds on global optimality, which complement the …