Mini-Workshop: Inference on Probabilities and Causality
September 2 (Tue.), 2008. 13:30--17:10
Seminar room 253 (2F), The Institute of Statistical Mathematics
(access)
Organized by Kenji Fukumizu (ISM).
Program
13:30 - 14:30. Non-Gaussian models for causal discovery: combining instantaneous and lagged effects
Aapo Hyvarinen (University of Helsinki)
14:50 - 15:50. Painless function space embeddings of distributions: theory and applications
Arthur Gretton
(Max-Planck Institute for Biological Cybernetics)
16:10 - 17:10. Assessing conditional independence with kernels
Xiaohai Sun (Max-Planck Institute for Biological Cybernetics)
Abstracts
- Aapo Hyvarinen: Non-Gaussian models for causal discovery: combining instantaneous and lagged effects.
Abstract:
Causal analysis of continuous-valued variables typically uses either autoregressive
models for lagged effects or linear Bayesian networks for instantaneous effects.
Estimation of Bayesian networks poses serious identifiability problems
if the classic assumption of Gaussianity is made, which is why we recently
proposed to use non-Gaussian models. In this talk, we first introduce
our approach of linear non-Gaussian Bayesian networks (or structural equation models).
Next, we propose to combine the non-Gaussian instantaneous model with
autoregressive models, leading to a new variant of what is called
"structural vector autoregressive" models in econometrics.
We show that such a combined non-Gaussian model is identifiable without
prior knowledge of network structure, and propose a computationally simple
estimation method shown to be consistent. This approach also points out
how neglecting instantaneous effects can lead to completely wrong estimates
of the autoregressive coefficients. This is joint work with Shohei Shimizu and Patrik O. Hoyer.
---
- Arthur Gretton: Painless function space embeddings of distributions: theory and applications
Abstract:
In the early days of kernel machines research, the "kernel trick" was considered
a useful way of constructing nonlinear algorithms from linear ones.
More recently, however, it has become clear that a potentially more
far reaching use of kernels is as a linear way of dealing with higher
order statistics, by embedding distributions in a suitable reproducing
kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion
into higher order moments or a conventional characteristic function approach,
embedding in RKHSs provides a painless, tractable way of embedding distributions.
This line of reasoning leads naturally to the questions: what does it mean
to embed a distribution in an RKHS? When is this embedding injective
(and thus, when do different distributions have unique mappings)?
What implications are there for learning algorithms that make use of these embeddings?
This talk aims at answering these questions.
Topics will include:
- Introduction to distribution embeddings; Maximum Mean
Discrepancy (MMD) as metric on distributions
- Characteristic kernels and injective embeddings in reproducing
kernel Hilbert spaces
- MMD as a measure of statistical dependence, and the Hilbert-Schmidt
Independence Criterion (HSIC)
- Applications of MMD to feature selection and unsupervised taxonomy
discovery
---
- Xiaohai Sun: Assessing conditional independence with kernels
(joint work with Kenji Fukumizu)
Abstract:
We present a kernel-based approach to capturing the conditional independence
relationship among a set of random variables on arbitrary domains.
Our approach embeds probability distributions into the so-called
reproducing kernel Hilbert space (RKHS). Dependences among variables can
then be measured by the distance between the elements in RKHS representing
corresponding distributions. Independence tests based on such measure
can be readily used, e.g., by any existing independence-based algorithm
for learning Bayesian network (BN) structure without making any assumption
about the probability distribution on various domains. This is particularly
useful for continuous or continuous-categorical (mixed) domains. In addition,
structures containing vectorial variables can be straightforwardly handled.
Another application of our kernel-based approach is the straightforward
nonlinear extension of the so-called Granger's concept of causality.
Our method is able to assess nonlinear Granger causality from multivariate
time series in the kernel framework and determine, in a model-free way,
whether the causal relationship between two time series is present or not,
whether the relationship is direct or mediated by a third process.
Encouraging results of various experiments with simulated and real-world
data show that our method provides not only, compared to other state-of-the-art
approaches to the same problem, a good alternative but also a novel tool for
structure learning in a more general setting.