The 2nd Workshop on Machine Learning and Optimization at the ISM
Taking advantage of the simultaneous presence of different international
researchers in Tokyo, an informal workshop on topics blending Statistical
Modelling and Optimization approaches to Machine Learning will be held
on October 12th 2007 in the Institute of Statistical Mathematics, Tokyo,
Japan. Many thanks to all speakers and participants!
NEW: Some slides can be downloaded below!
10:00  10:10  Opening Remarks  
10:10  11:05  Collaborative filtering with kernels and spectral regularization  JeanPhilippe Vert 
11:05  12:00  Bundle Methods for Machine Learning  Alexander J. Smola  12:00  13:30 
Lunch break 
13:30  14:25  Hilbert Space Representations of Probability Distributions  Arthur Gretton 
14:25  15:20  Measuring conditional dependence with kernels  Kenji Fukumizu  15:20  15:40  Break  
15:40  16:35  Cluster Identification in NearestNeighbor Graphs  Markus Maier 
16:35  17:30  Epagogics: Beyond Newtonian deduction based paradigm towards 'universal' induction machines  Kunio Tanabe 
[ Access ]
Please follow this link for access information. The Workshop will be held in the conference room
(2F).
[ Organizers ]
Please contact Tomoko Matsui for any questions
regarding the workshop.
[ Detailed Program And Slides]
slides  Collaborative filtering with kernels and spectral regularization  JeanPhilippe Vert 
I will present a general framework for Collaborative Filtering (CF), which
is the task of learning preferences of users for products, such as books
or movies, from a set of known preferences. A standard approach to CF is
to find a low rank, or low trace norm, approximation to a partially observer
matrix of user preferences. We generalize this approach to estimation of
a compact operator, of which matrix estimation is a special case. We develop
a notion of spectral regularization which captures both rank constraint
and trace norm regularization. The major advantage of this approach is
that it provides a natural method of utilizing sideinformation, such as
age and gender, about the users (or objects) in question  a formerly challenging
limitation of the lowrank approach. We provide a number of algorithms,
and test our results on a standard CF dataset with promising results. This
is a joint work with Jacob Abernethy, Francis Bach and Theodoros Evgeniou. 
slides  Bundle Methods for Machine Learning  Alexander J. Smola  We present a globally convergent method for regularized risk minimization problems.
Our method applies to Support Vector estimation, regression, Gaussian Processes,
and any other regularized risk minimization setting which leads to a convex
optimization problem. SVMPerf can be shown to be a special case of our
approach. In addition to the unified framework we present tight convergence
bounds, which show that our algorithm converges in O(1/e) steps to e precision for general convex problems and in O(log e) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach.  slides  Hilbert Space Representations of Probability Distributions  Arthur Gretton 
Many problems in unsupervised learning require the analysis of features
of probability distributions. At the most fundamental level, we might wish
to determine whether two distributions are the same, based on samples from
each  this is known as the twosample or homogeneity problem. We use kernel
methods to address this problem, by mapping probability distributions to
elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently
rich RKHS, these representations are unique: thus comparing feature space
representations allows us to compare distributions without ambiguity. Applications
include testing whether cancer subtypes are distinguishable on the basis
of DNA microarray data, and whether low frequency oscillations measured
at an electrode in the cortex have a different distribution during a neural
spike.
A more difficult problem is to discover whether two random variables drawn
from a joint distribution are independent. It turns out that any dependence
between pairs of random variables can be encoded in a crosscovariance
operator between appropriate RKHS representations of the variables, and
we may test independence by looking at a norm of the operator. We demonstrate
this independence test by establishing dependence between an English text
and its French translation, as opposed to French text on the same topic
but otherwise unrelated. Finally, we show that this operator norm is itself
a difference in feature means. 
slides  Measuring conditional dependence with kernels  Kenji Fukumizu 
We propose a new measure of conditional dependence of random variables,
based on normalized crosscovariance operators on reproducing kernel Hilbert
spaces. Unlike previous kernel dependence measures, the proposed criterion
does not depend on the choice of kernel in the limit of infinite data,
for a wide class of kernels. At the same time, it has a straightforward
empirical estimate with good convergence behaviour. In the special case
of unconditional dependence, the measure is exactly the same as the mean
square contingency, which is one of the popular measures of dependence.
We discuss the theoretical properties of the measure, and demonstrate its
application in experiments. 
 Cluster Identification in NearestNeighbor Graphs  Markus Maier 
Assume we are given a sample of points from some underlying distribution
which contains several distinct clusters. Our goal is to construct a neighborhood
graph on the sample points such that clusters are identified: that is,
the subgraph induced by points from the same cluster is connected, while
subgraphs corresponding to different clusters are not connected to each
other. We derive bounds on the probability that cluster identification
is successful, and use them to predict optimal values of k for the mutual
and symmetric knearestneighbor graphs. We point out different properties
of the mutual and symmetric nearestneighbor graphs related to the cluster
identification problem. 
 Epagogics: Beyond Newtonian deduction based paradigm towards 'universal' induction machines  Kunio Tanabe 

