Workshop on Machine Learning Methods in Statistical Science
October 23 (Fri.), 2009. 13:00--17:50
Sponsored by ISM Research Innovation Center and
Seminar room 5 (3F, D313, 314), The Institute of Statistical Mathematics
(Access: (Japanese), (English))
Organized by Kenji Fukumizu and Shiro Ikeda, ISM
13:00 - 14:00. Mining Frequent Subgraphs from Linear Graphs
Koji Tsuda (AIST Computational Biology Research Center)
14:10 - 15:10. Hilbert Space Embeddings and Metrics on Probability measures
Bharath K Sriperumbudur
(University of California, San Diego)
break 15:10 - 15:40
15:40 - 16:40. Partial Order Embedding with Multiple Kernels
Gert Lanckriet (University of California, San Diego)
16:50 - 17:50. SpicyMKL
Taiji Suzuki (University of Tokyo)
- Koji Tsuda: Mining Frequent Subgraphs from Linear Graphs.
A linear graph is a graph whose vertices are totally ordered.
Biological and linguistic sequences with interactions among symbols are
naturally represented as linear graphs. Examples include protein
contact maps, RNA secondary structures and predicate-argument
structures. Our algorithm, linear graph miner (LGM), leverages the
vertex order for efficient enumeration of frequent subgraphs. Based on
the reverse search principle, the pattern space is systematically
traversed without expensive duplication checking. Disconnected subgraph
patterns are particularly important in linear graphs due to their
sequential nature. Unlike conventional graph mining algorithms
detecting connected patterns only, LGM can detect disconnected patterns
as well. The utility and efficiency of LGM are demonstrated in
experiments on protein contact maps and natural language sentences.
- Bharath Sriperumbudur: Hilbert Space Embeddings and Metrics on Probability measures
this talk, we will present a novel measure of distance between
probability measures obtained by embedding them into a reproducing
kernel Hilbert space (RKHS), and computing the RKHS distance between
these embeddings. We will examine the conditions on RKHS under which
the embedding is injective so that it induces a metric on the space of
probability measures. We will discuss some advantages of this distance
measure over other well known distances like total variation distance,
the Hellinger distance, the Kullback-Leibler divergence etc.,
especially in the context of statistical inference applications.
Joint work with Arthur Gretton (Max Planck Institute for
Biological Cybernetics, Tubingen), Kenji Fukumizu (Institute of
Statistical Mathematics, Tokyo), Gert Lanckriet (UCSD) and Bernhard
Scholkopf (Max Planck Institute for Biological Cybernetics, Tubingen). ---
- Gert Lanckriet: Partial Order Embedding with Multiple Kernels.
We consider the problem of embedding arbitrary objects (e.g., images,
audio, documents) into a Euclidean space subject to a partial order
over pairwise distances. Partial order constraints arise naturally when
modeling human perception of similarity. Our partial order framework
enables the use of graph-theoretic tools to more efficiently produce
the embedding, and exploit global structure within the constraint set.
We present an embedding algorithm based on semidefinite programming,
which can be parameterized by multiple kernels to yield a unified space
from heterogeneous features. This framework also allows easy
- Taiji Suzuki: SpicyMKL.
We propose a new
optimization algorithm for Multiple Kernel Learning (MKL) with general
convex loss functions. The proposed algorithm is a proximal
minimization method that utilizes the ``smoothed'' dual objective
function and converges super-linearly. The sparsity of the intermediate
solution plays a crucial role for the efficiency of the proposed
algorithm. Consequently our algorithm scales well with increasing
number of kernels. Experimental results show that our algorithm is
favorable against existing methods especially when the number of
kernels is large (> 1000). Generalizations to a broader class of
sparse regularizations such as elastic net are also discussed.