Title and Abstract of talks:
March 13rd (Thu)
Michael Elad | (Technion, Israel) |
Sparse Modeling of Graph-Structured Data ... and ... Images. |
slides:
ppt,
pdf
Abstract: Images, video, audio, text documents, financial data, medical information, traffic info
- all these and many others are data sources that can be effectively processed. Why?
Is it obvious? In this talk we will start by discussing "modeling" of data as a way to enable
their actual processing, putting emphasis on sparsity-based models. We will turn our attention
to graph-structured data and propose a tailored sparsifying transform for its dimensionality reduction
and subsequent processing. We shall conclude by showing how this new transform becomes relevant and
powerful in revisiting ... classical image processing tasks.
(*) This talk covers portions from a joint work with Idan Ram and Israel Cohen.
Yoshiyuki Kabashima | (Tokyo Institute of Technology) |
Statistical mechanical approach to sample complexity of dictionary learning |
Abstract: Finding a basis matrix (dictionary) by which objective signals are represented sparsely
is of major relevance in various scientific and technological fields. We consider a problem
to learn a dictionary from a set of training signals. We employ techniques of statistical mechanics
of disordered systems to evaluate the size of the training set necessary to typically succeed
in the dictionary learning. The results indicate that the necessary size is much smaller
than previously estimated, which theoretically supports and/or encourages the use of dictionary learning
in practical situations.
#This talk is based on a collaboration with Ayaka Sakata.
Toshiyuki Tanaka | (Kyoto University) |
TBA |
Abstract:
Shiro Ikeda | (Institute of Statistical Mathematics) |
Some applications of sparse modelling in physical measurements |
slides
Abstract: We understand that the sparsity based methods are effective for
many physical measurement methods. Here we show some applications of them:
two examples from astronomy (the interferometer and the Compton camera imaging),
and one from diffraction imaging. All the image reconstruction methods are
based on the Bayesian methods.
Kenji Nagata | (University of Tokyo) |
Sparse modeling and variable selection with the exchange Monte Carlo method |
Abstract: Feature selection is an important process for improving the generalization
capability and interpretability of learned models through the selection of
a relevant feature subset. In the last two decades, a number of feature selection
methods, such as L1 regularization and automatic relevance determination have been
intensively developed and used in a wide range of areas. We can select a relevant
subset of features, by using these feature selection methods. In this talk,
we introduce a new method of an exhaustive search method, instead of those method,
by using the exchange Monte Carlo method. Moreover, we propose a new hypothesis
testing to judge whether the given data include the information for the desired
binary classification by using an exhaustive search method.
March 14th (Fri)
Noureddine El Karoui | (UC Berkeley, USA) |
On high-dimensional robust regression, penalized robust regression and GLMs |
Abstract: I will discuss the behavior of widely used statistical methods
in the high-dimensional setting where the number of observations, n,
and the number of predictors, p, are both large. I will present limit theorems
about the behavior of the corresponding estimators, their asymptotic risks etc...
The results apply not only to robust regression estimators, but also Lasso-type
estimators and many much more complicated problems. Some of the results answer a question raised by Huber in his seminal '73 paper
on robust regression.
Many surprising statistical phenomena occur: for instance, maximum likelihood
methods are shown to be (grossly) inefficient, and loss functions that should
be used in regression are shown to depend on the ratio p/n. This means that
dimensionality should be explicitly taken into account when performing simple
tasks such as regression.
More generally, we'll see that intuition based on results obtained in the
small p, large n setting leads to misconceptions and the use of suboptimal
procedures. It also turns out that inference is possible in this setting.
We'll also see that the geometry of the design matrix plays a key role
in these problems and use this fact to disprove claims of universality of
some of the results.
Mathematically, the tools needed mainly come from random matrix theory,
measure concentration and convex analysis.
Kei Kobayashi | (Institute of Statistical Mathematics) |
Curvature of empirical metrics on a data space and its deformation |
Abstract: The talk is largely about empirical metrics and empirical geodesics
and their application to data analysis. We introduce a novel deformation of
a metric called alpha, beta, gamma metric and give a way of computation via
the metric graphs. This new deformation of a metric relates to a curvature of
the data space and robustness of the estimation based on it. The means by
the alpha and beta metrics correspond to an intrinsic and an extrinsic mean,
respectively, usually used when the data is on a specific manifold though
our method is not necessarily for such cases. We show some theoretical results
on the properties of the metric and give some examples to show how the metric
works for data analysis.
This is a collaboration with Prof. Henry P. Wynn (London School of Economics).
Kengo Kato | (University of Tokyo) |
Gaussian approximations and multiplier bootstrap for maxima of sums of
high-dimensional random vectors
(joint work with V. Chernozhukov and D. Chetverikov) |
slides
Abstract: We derive a Gaussian approximation result for the maximum of a
sum of high-dimensional random vectors. Specifically, we establish conditions
under which the distribution of the maximum is approximated by that of
the maximum of a sum of the Gaussian random vectors with the same covariance
matrices as the original vectors. This result applies when the dimension of
random vectors (p) is large compared to the sample size (n); in fact,
p can be much larger than n, without restricting correlations of the coordinates
of these vectors. We also show that the distribution of the maximum of
a sum of the random vectors with unknown covariance matrices can be consistently
estimated by the distribution of the maximum of a sum of the conditional
Gaussian random vectors obtained by multiplying the original vectors with i.i.d.
Gaussian multipliers. This is the Gaussian multiplier (or wild) bootstrap
procedure. Here too, p can be large or even much larger than n.
These distributional approximations, either Gaussian or conditional Gaussian,
yield a high-quality approximation to the distribution of the original maximum,
often with approximation error decreasing polynomially in the sample size,
and hence are of interest in many applications. We demonstrate how our
Gaussian approximations and the multiplier bootstrap can be used for
modern high-dimensional estimation, multiple hypothesis testing,
and adaptive specification testing. All these results contain nonasymptotic
bounds on approximation errors.
Arthur Gretton | (University College London) |
Kernel Distribution Embeddings and Tests for Three-Variable Interactions |
slides
Abstract: We introduce nonparametric tests for Lancaster three-variable interaction and for total independence,
using embeddings of signed measures into a reproducing kernel Hilbert space (RKHS). Such embeddings
have been applied successfully in two-sample testing (of whether two distributions are the same),
and independence testing (of whether two random variables are dependent). We show that test statistics
for these simpler cases can be constructed from RKHS embeddings of measures, and generalize these ideas
to test statistics for interactions among three variables - the two statistics obtained are
the Lancaster interaction statistic, and the total independence statistic.
The Lancaster and total independence statistics are straightforward to compute, and the resulting tests
are are consistent against all alternatives for a large family of reproducing kernels. We show
the Lancaster test to be sensitive to cases where two independent causes individually have weak influence
on a third dependent variable, but their combined effect has a strong influence. This makes
the Lancaster test especially suited to finding structure in directed graphical models,
where it outperforms competing nonparametric tests in detecting such V-structures.
Relevant paper: Sejdinovic, D., Gretton, A., and Bergsma, W., A Kernel Test for Three-Variable Interactions, NIPS, 2013
Shotaro Akaho |
(Advanced Institute of Science and Technology) |
On the robust nonlinear curve fitting |
slides
Abstract: Curve and surface fitting is a basic procedure in computer vision and machine learning.
In order to fit nonlinear curve efficiently, feature space mapping is often used in which each point
is mapped into high dimensional space by nonlinear transformation. Such a mapping enables to solve
the fitting problem by linear computation. However, this simplification causes several problems, and I will pick up such problems and talk about some attempts to solve them.
The first one is the difference between metric of input space and feature space.
The second one is robustness against outliers, which is also a problem in linear fitting.
The third one arises when feature space is extended to infinite space such as reproducing kernel Hilbert space.
Kenji Fukumizu |
(Institute of Statistical Mathematics) |
Nonparametric Bayesian Inference with Positive Definite Kernels |
slides
Abstract: Kernel methods have successfully used in many machine learning problems
with favorable performance in extracting nonlinear structure of high-dimensional
data. Recently, nonparametric inference methods with positive definite kernels
have been developed, employing the kernel mean expression of distributions.
In this approach, the distribution of a variable is represented by the mean
element of the random feature vector defined with the mapping of the variable
by the kernel function, and the relation between variables is expressed by
a covariance operator. In this talk, I introduce Bayesian inference with
this kernel mean expression; namely Bayes rule with the kernel mean and
covariance expression is realized to compute the kernel mean of posterior.
This approach provides a novel nonparametric way of Bayesian inference,
expressing a distribution with weighted sample, and computing posterior
with matrix calculation. Some theoretical results on the convergence of the
estimators are also shown, and the usefulness is effectively demonstrated
with the example of nonparametric hidden Markov models.
March 15th (Sat)
Milos Radovanovic |
@(Univ. Novi Sad, Serbia) |
Hubs in Nearest-Neighbor Graphs: Origins, Applications and Challenges |
slides
Abstract: The tendency of k-nearest neighbor graphs constructed from tabular data using some distance measure
to contain hubs, i.e. points with in-degree much higher than expected, has drawn a fair amount of attention
in recent years due to the observed impact on techniques used in many application domains.
This talk will be organized into three parts: (1) Origins, which will discuss the causes of the emergence
of hubs (and their low in-degree counterparts, the anti-hubs), and their relationships with dimensionality,
neighborhood size, distance concentration, and the notion of centrality; (2) Applications, where we will
present some notable effects of (anti-)hubs on techniques for machine learning, data mining and information
retrieval, identify two different approaches to handling hubs adopted by researchers - through fighting
or embracing their existence - and review techniques and applications belonging to the two groups; and
(3) Challenges, which will discuss work in progress, open problems, and areas with significant opportunities
for hub-related research.
Ikumi Suzuki |
(National Institute of Genetics) |
The effect of data centering for k-nearest neighbor |
slides
Abstract: The k-nearest neighbor (kNN) is a simple and popular method in various machine learning tasks.
However, when the data dimension is high the kNN often results in poor performances.
Lately, as one of the of curse of dimensionality problem, hubness phenomena is reported in terms of the kNN.
A "hub" is a sample which is similar to many other samples in a dataset.
While spectral property of the laplacian-based kernels is useful for the dimensionality reduction
and the spectral clustering etc., this property also effects on the kNN, namely for reduction of hub samples.
On further study, simply centring similarity measures, which can be seen as shifting the data origin
to the data centroid in a feature space (data centering), has also property of hub reduction.
Ryota Tomioka |
(Toyota Technological Institute, Chicago) |
Towards better computation-statistics trade-off in tensor decomposition |
slides
Abstract: New approaches for tensor decomposition with worst case performance guarantee have recently emerged.
O(rn^{d-1} is the number of samples required for recovering an n x n x ... x n d-way tensor that admits
a Tucker decomposition with r x r x ... x r core proved for one of these methods called overlapped trace norm.
Although this method is computationally very efficient, the rate is rather disappointing. A newer approach
called square norm achieves lower O(r^{d/2}n^{d/2}) samples at the cost of higher computational demand.
There is also a more theoretical approach that could achieve optimal O(rnd) but seem to be computationally
intractable. I will overview these recent developments and discuss how we could achieve a better trade-off
between what we can provably achieve in terms of statistical accuracy and how much computation we need to spend.
Taiji Suzuki |
(Tokyo Institute of Technology) |
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model |
slides
Abstract: We develop a PAC-Bayesian bound for the convergence rate of a Bayesian variant of
Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model.
Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue
condition for the analysis of Lasso and Dantzig selector. In this talk, we show a PAC-Bayesian bound
which states that the Bayesian variant of MKL achieves the optimal convergence rate
without such strong conditions on the design. Basically our approach is a combination of PAC-Bayes
and theories of non-parametric Gaussian process regressions. Our analysis includes the existing result
of Gaussian process as a special case. The analysis also offers the convergence rate of the Bayesian
variant of Group Lasso as a finite dimensional special case.