## The 12th Statistical Machine Learning Seminar (2013.7.11)

Date/time: July 11th (Thu) 15:00-17:00

Place: Seminar Room 5 (3F, D313),

Institute of Statistical Mathematics (Tachikawa, Tokyo)

http://www.ism.ac.jp/access/index_e.html

Speaker 1

Vincent Q. Vu

(Dept. Statistics, Ohio State University)

http://www.vince.vu/

Title: Sparse Principal Components and Subspaces:

Concepts, Theory, and Algorithms

Abstract:

Principal components analysis (PCA) is a popular technique for

unsupervised dimension reduction that has a wide range of application

in science, engineering, and any place where multivariate data is

abundant. Its main idea is to look for linear combinations of the

variables with the largest variance. These linear combinations

correspond to eigenvectors of a covariance matrix. However, in

modern applications where the number of variables can be much

larger than the number of samples, PCA suffers from two major

weaknesses: 1) the interpretability and subsequent use of the

principal directions is hindered by their dependence on all of the

variables; 2) it is generally inconsistent in high-dimensions, i.e.

the estimated principal directions can be noisy and unreliable.

This has motivated much research over the past decade into a

class of techniques called sparse PCA that combine the essence of

PCA with the assumption that the phenomena of interest depend

mostly on a few variables.

In this talk, I will present some recent theoretical results on

sparse PCA including optimal minimax bounds for estimating the

principal eigenvector and optimal minimax bounds for

estimating the principal subspace spanned by the eigenvectors of a

general covariance matrix. The optimal estimators turn out to be

NP-hard to compute. However, I will also present a very recent

result that shows that a convex relaxation, due to

d’Aspremont et al. (2007), is a near-optimal estimator of the

principal eigenvector under very general conditions.

–

Speaker 2:

Mauricio Alvarez

(Dept. Electrical Engineering, Universidad Tecnológica de Pereira,

Columbia)

https://sites.google.com/site/maalvarezl/

Title: Multi-output Gaussian Processes.

Abstract:

In this talk, we will review the problem of modeling correlated outputs

using Gaussian process priors. Applications of modeling correlated outputs

include the joint prediction of pollutant metals in geostatistics, and

multitask learning in machine learning. Defining a Gaussian process prior

for correlated outputs translates into specifying a suitable covariance

function that captures dependencies between the different output variables.

Classical models for obtaining such a covariance function include the linear

model of coregionalization and process convolutions. We describe a general

framework for developing multiple output covariance functions by performing

convolutions between smoothing kernels particular to each output and

covariance functions that are common to all outputs. Both the linear model

of coregionalization and the process convolutions turn out to be special

cases of this framework. Practical aspects of the methodology involve the

use of domain-specific knowledge for defining relevant smoothing kernels,

efficient approximations for reducing computational complexity and a method

for establishing a general class of nonstationary covariances with

applications in robotics and motion capture data.