Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 169-184

Extracting Pseudo-biclusters from Gene Expression Data Based on Suffix Tree

Tetsuro Namba
(Division of Computer Science, IST, Hokkaido University)
Makoto Haraguchi
(Division of Computer Science, IST, Hokkaido University)
Yoshiaki Okubo
(Division of Computer Science, IST, Hokkaido University)

This paper describes a method for finding Pseudo-Biclusters of gene expression data. For time series data, a linear time algorithm with the help of a suffix tree has been proposed. Although this algorithm can efficiently enumerate all maximal biclusters, we often observe many overlapping clusters. By combining such clusters, we can interestingly observe that all genes in the combined cluster behave quite similarly within a common time span, but differently after that. This observation is expected to provide valuable suggestions to experts. Thus, we introduce a notion of pseudo-biclusters. A pseudo-bicluster consists of several maximal biclusters with some overlap. We design a polynomial time algorithm for finding them with a suffix tree. Some experimental results for gene expression data of ascidian (Hoya) are also presented, showing an interesting actually-extracted cluster.

Key words: Biclustering, pseudo-bicluster, suffix tree, gene expression data, time series data.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 185-198

Feature Extraction by Geometric Algebra from Geometric Data

Minh Tuan Pham
(School of Engineering, Nagoya University)
Kanta Tachibana
(School of Engineering, Nagoya University)
Eckhard Hitzer
(School of Engineering, University of Fukui)
Sven Buchholz
(Institute for Informatics, University of Kiel)
Tomohiro Yoshikawa
(School of Engineering, Nagoya University)
Takeshi Furuhashi
(School of Engineering, Nagoya University)

Most conventional methods of feature extraction for pattern recognition do not pay sufficient attention to inherent geometric properties of data, even where the data have characteristic spatial features. In this study, we introduce geometric algebra to systematically extract invariant geometric features from spatial data given in a vector space. Geometric algebra is a multidimensional generalization of complex numbers and of quaternions, and can accurately describe oriented spatial objects and relations between them. We further propose a combination of several geometric features using Gaussian mixture models. We demonstrate our new method by classification of hand-written digits and alphabetic characters.

Key words: Geometric algebra, feature extraction, Gaussian mixture model, pattern recognition, mixture of experts.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 199-213

Path Analysis in a Supermarket and String Analysis Technique

Katsutoshi Yada
(Faculty of Commerce, Kansai University)

This paper presents the availability and usefulness of a string analysis technique for developing useful rules to determine customers' visiting patterns in sales area. It focuses on stationary states of customers in certain sales areas in a store. We apply a string analysis technique, EBONSAI, to sales area visiting patterns to effectively deal with a huge stream of data. Experiments were conducted to extract useful rules and findings about characteristics of sales area visiting patterns and we discuss problems remaining in existing string analysis techniques.

Key words: Supermarket, marketing, RFID, string analysis technique, EBONSAI.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 215-224

Channel Estimation and Code Word Inference for Mobile Digital Satellite Broadcasting Reception

Masatoshi Hamada
(The Graduate University of Advanced Studies)
Shiro Ikeda
(The Institute of Statistical Mathematics)

This paper proposes a method to improve the mobile reception quality of digital satellite broadcasting. In the method we describe the channel with a regression model and apply the stochastic inference method of the code words based on the channel model. The proposed method consists of parameter estimation methods of the channel model and stochastic inference methods. For each of the channel estimation and the stochastic inference, two methods are proposed. The maximum likelihood estimation (MLE) and the higher order statistics (HOS) matching methods are proposed for the estimation methods. As for the stochastic inference, we propose the marginal and joint probability inference methods. The improvements are confirmed through experiments with the measured data. The computational costs are also discussed for the future implementation.

Key words: Channel model, channel estimation, stochastic inference of code word.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 225-234

Merging Particle Filter and Its Characteristics

Shin'ya Nakano
(The Institute of Statistical Mathematics)
(Japan Science and Technology Agency)
Genta Ueno
(The Institute of Statistical Mathematics)
(Japan Science and Technology Agency)
Kazuyuki Nakamura
(The Institute of Statistical Mathematics)
(Japan Science and Technology Agency)
Tomoyuki Higuchi
(The Institute of Statistical Mathematics)
(Japan Science and Technology Agency)

A significant problem with the basic particle filter algorithm is degeneration. The merging particle filter algorithm has recently been proposed to overcome this problem at a reasonable computational cost. In an MPF, each member of a filtered ensemble is generated from a weighted sum of multiple samples from the forecast ensemble such that the mean and covariance of the filtered distribution are approximately preserved. In this study, we performed data assimilation experiments using an MPF with two different sets of merging weights. When one merging weight is set to near 1 and the other weights are set small, better estimates are obtained for data assimilation into a low dimensional model. For data assimilation into a relatively large dimensional model, such a weight set requires a large ensemble size.

Key words: Data assimilation, particle filter, merging particle filter.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 235-252

Quantifying Smallpox Transmission Using Historical Data: A Database for Statistical Modeling

Hiroshi Nishiura
(Theoretical Epidemiology, University of Utrecht)

Although smallpox is the only disease to have been eradicated worldwide, the threat of bioterrorism has led to debate on potential countermeasures in the event of an attack. Because of its global eradication, we have to use historical records to quantify important biological and epidemiologic characteristics in order to optimize intervention. This article reports quantitative modeling of the transmission and spread of smallpox using historical data. In particular, technical issues on database construction specifically aimed at statistical modeling are summarized. As typical examples, I briefly discuss how smallpox spreads in the absence of intervention, focusing on the optimization of quarantine and isolation and on disaster size estimation in the event of an attack. Critically important aspects in extracting key information from the database of smallpox epidemics are summarized, and I mention potential pitfalls in utilizing the historical data.

Key words: Historical data, database, smallpox, epidemiology, model, bioterrorism.

Proceedings of the Institute of Statistical Mathematics Vol. 56, No. 2, 253-258

On the Use of the Concept of Plausibility

Hirotugu Akaike
(Emeritus Professor, The Institute of Statistical Mathematics)

The comparison and selection of models was enhanced by the use of log likelihood as the measure of the goodness of a statistical model based on an observation. However, conventional use of statistical models is limited to the application of known structures to the given data, and the discussion of the composition of new models is not developed yet. It is necessary to discuss the methodology of developing verbally defined models for the development of statistical reasoning as a method of general scientific reasoning.

In this paper the necessity of the construction of models by language and the application of the concept of plausibility for the evaluation of the resulting models are illustrated by a concrete example of the analysis of golf swing motion.

Key words: Likelihood, plausibility, golf swing motion.