The 2nd Workshop on Machine Learning and Optimization at the ISM

Chaired by the Multimodal Data Project of ROIS


Taking advantage of the simultaneous presence of different international researchers in Tokyo, an informal workshop on topics blending Statistical Modelling and Optimization approaches to Machine Learning will be held on October 12th 2007 in the Institute of Statistical Mathematics, Tokyo, Japan. Many thanks to all speakers and participants!

NEW: Some slides can be downloaded below!


10:00 - 10:10 Opening Remarks
10:10 - 11:05 Collaborative filtering with kernels and spectral regularizationJean-Philippe Vert
11:05 - 12:00 Bundle Methods for Machine LearningAlexander J. Smola
12:00 - 13:30 Lunch break
13:30 - 14:25 Hilbert Space Representations of Probability DistributionsArthur Gretton
14:25 - 15:20 Measuring conditional dependence with kernelsKenji Fukumizu
15:20 - 15:40 Break
15:40 - 16:35 Cluster Identification in Nearest-Neighbor GraphsMarkus Maier
16:35 - 17:30 Epagogics: Beyond Newtonian deduction based paradigm towards 'universal' induction machinesKunio Tanabe


[ Access ]
Please follow this link for access information. The Workshop will be held in the conference room (2F).

[ Organizers ]
Please contact Tomoko Matsui for any questions regarding the workshop.

[ Detailed Program And Slides]
slides Collaborative filtering with kernels and spectral regularizationJean-Philippe Vert
I will present a general framework for Collaborative Filtering (CF), which is the task of learning preferences of users for products, such as books or movies, from a set of known preferences. A standard approach to CF is to find a low rank, or low trace norm, approximation to a partially observer matrix of user preferences. We generalize this approach to estimation of a compact operator, of which matrix estimation is a special case. We develop a notion of spectral regularization which captures both rank constraint and trace norm regularization. The major advantage of this approach is that it provides a natural method of utilizing side-information, such as age and gender, about the users (or objects) in question - a formerly challenging limitation of the low-rank approach. We provide a number of algorithms, and test our results on a standard CF dataset with promising results. This is a joint work with Jacob Abernethy, Francis Bach and Theodoros Evgeniou.
slides Bundle Methods for Machine LearningAlexander J. Smola
We present a globally convergent method for regularized risk minimization problems.
Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/e) steps to e precision for general convex problems and in O(log e) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach.
slidesHilbert Space Representations of Probability DistributionsArthur Gretton
Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike.

A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means.
slidesMeasuring conditional dependence with kernelsKenji Fukumizu
We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. In the special case of unconditional dependence, the measure is exactly the same as the mean square contingency, which is one of the popular measures of dependence. We discuss the theoretical properties of the measure, and demonstrate its application in experiments.
Cluster Identification in Nearest-Neighbor GraphsMarkus Maier
Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are identified: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict optimal values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem.
Epagogics: Beyond Newtonian deduction based paradigm towards 'universal' induction machinesKunio Tanabe