Workshop on Machine Learning Methods in Statistical Science

October 23 (Fri.), 2009. 13:00--17:50
Seminar room 5 (3F, D313, 314), The Institute of Statistical Mathematics (Access: (Japanese), (English))

Sponsored by ISM Research Innovation Center and DEX-SMI
Organized by Kenji Fukumizu and Shiro Ikeda, ISM

Program

13:00 - 14:00. Mining Frequent Subgraphs from Linear Graphs
Koji Tsuda (AIST Computational Biology Research Center)

14:10 - 15:10. Hilbert Space Embeddings and Metrics on Probability measures
Bharath K Sriperumbudur (University of California, San Diego)

break 15:10 - 15:40

15:40 - 16:40. Partial Order Embedding with Multiple Kernels
Gert Lanckriet (University of California, San Diego)

16:50 - 17:50. SpicyMKL
Taiji Suzuki (University of Tokyo)

Abstracts

Koji Tsuda: Mining Frequent Subgraphs from Linear Graphs.
Abstract:
A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient enumeration of frequent subgraphs. Based on the reverse search principle, the pattern space is systematically traversed without expensive duplication checking. Disconnected subgraph patterns are particularly important in linear graphs due to their sequential nature. Unlike conventional graph mining algorithms detecting connected patterns only, LGM can detect disconnected patterns as well. The utility and efficiency of LGM are demonstrated in experiments on protein contact maps and natural language sentences.
---
Bharath Sriperumbudur: Hilbert Space Embeddings and Metrics on Probability measures
Abstract:
In this talk, we will present a novel measure of distance between probability measures obtained by embedding them into a reproducing kernel Hilbert space (RKHS), and computing the RKHS distance between these embeddings. We will examine the conditions on RKHS under which the embedding is injective so that it induces a metric on the space of probability measures. We will discuss some advantages of this distance measure over other well known distances like total variation distance, the Hellinger distance, the Kullback-Leibler divergence etc., especially in the context of statistical inference applications.
Joint work with Arthur Gretton (Max Planck Institute for Biological Cybernetics, Tubingen), Kenji Fukumizu (Institute of Statistical Mathematics, Tokyo), Gert Lanckriet (UCSD) and Bernhard Scholkopf (Max Planck Institute for Biological Cybernetics, Tubingen).
---
Gert Lanckriet: Partial Order Embedding with Multiple Kernels.
Abstract:
We consider the problem of embedding arbitrary objects (e.g., images, audio, documents) into a Euclidean space subject to a partial order over pairwise distances. Partial order constraints arise naturally when modeling human perception of similarity. Our partial order framework enables the use of graph-theoretic tools to more efficiently produce the embedding, and exploit global structure within the constraint set. We present an embedding algorithm based on semidefinite programming, which can be parameterized by multiple kernels to yield a unified space from heterogeneous features. This framework also allows easy out-of-sample extensions.
---
Taiji Suzuki: SpicyMKL.
Abstract:
We propose a new optimization algorithm for Multiple Kernel Learning (MKL) with general convex loss functions. The proposed algorithm is a proximal minimization method that utilizes the ``smoothed'' dual objective function and converges super-linearly. The sparsity of the intermediate solution plays a crucial role for the efficiency of the proposed algorithm. Consequently our algorithm scales well with increasing number of kernels. Experimental results show that our algorithm is favorable against existing methods especially when the number of kernels is large (> 1000). Generalizations to a broader class of sparse regularizations such as elastic net are also discussed.