第22回統計的機械学習セミナー / The 22nd Statistical Machine Learning Seminar

Date&Time
2014年11月27日(木)15:00-
/ 27 November, 2014 (Thu) 15:00-

Admission Free,No Booking Necessary

登録は不要ですが,席に限りがあるため,外部から参加ご希望の場合は
福水 fukumizuアットマークism.ac.jp までメールで事前にご連絡いただけると助かります.
Place
統計数理研究所 セミナー室3(D312A)
/ Seminar Room 3(D312A)@ The Institute of Statistical Mathematics
区切り線
Speaker
Aaditya Ramdas (Carnegie Mellon University)
Title
On the Power of a Nonparametric Two Sample Test in High Dimensions
Abstract

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (general alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (mean-difference alternatives).

In this talk, I will summarize some recent results on the power of some popular nonparametric two sample tests, that are designed for general alternatives, under a mean-difference alternative in the high-dimensional setting, subject to different computational constraints. Specifically, we explicitly characterize the power of the linear-time, sub-quadratic-time and quadratic-time versions of the Maximum Mean Discrepancy statistic using the Gaussian kernel (G-MMD), and the Energy Distance (ED) statistic using the Euclidean norm, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. Some surprising and interesting findings include - a) there is a clear and smooth computation-power tradeoff for these tests, and expending more computation yields direct statistical benefit, b) the power is independent of the kernel bandwidth, as long as it is equal to or larger than the choice made by the median heuristic, c) ED and G-MMD have the same power and d) general tests like ED and G-MMD enjoy a free lunch, since they have the same power as specialized tests for detecting mean differences. This is the first explicit power derivation for any general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternatives perform when faced with easier ones.

Some initial work is published at AAAI'15 (http://arxiv.org/pdf/1406.2083v2.pdf), details on linear tests are submitted to AISTATS'15
(http://arxiv.org/pdf/1411.6314.pdf), and further details are still in preparation.