## Extracting Pseudo-biclusters from Gene Expression Data Based on Suffix Tree

This paper describes a method for finding *Pseudo-Biclusters* of gene expression data. For time series data, a linear time algorithm with the help of a *suffix tree* has been proposed. Although this algorithm can efficiently enumerate all maximal biclusters, we often observe many overlapping clusters. By combining such clusters, we can interestingly observe that all genes in the combined cluster behave quite similarly within a common time span, but differently after that. This observation is expected to provide valuable suggestions to experts. Thus, we introduce a notion of *pseudo-biclusters*. A pseudo-bicluster consists of several maximal biclusters with some overlap. We design a polynomial time algorithm for finding them with a suffix tree. Some experimental results for gene expression data of ascidian (Hoya) are also presented, showing an interesting actually-extracted cluster.

Key words: Biclustering, pseudo-bicluster, suffix tree, gene expression data, time series data.

## Feature Extraction by Geometric Algebra from Geometric Data

Most conventional methods of feature extraction for pattern recognition do not pay sufficient attention to inherent geometric properties of data, even where the data have characteristic spatial features. In this study, we introduce geometric algebra to systematically extract invariant geometric features from spatial data given in a vector space. Geometric algebra is a multidimensional generalization of complex numbers and of quaternions, and can accurately describe oriented spatial objects and relations between them. We further propose a combination of several geometric features using Gaussian mixture models. We demonstrate our new method by classification of hand-written digits and alphabetic characters.

Key words: Geometric algebra, feature extraction, Gaussian mixture model, pattern recognition, mixture of experts.

## Path Analysis in a Supermarket and String Analysis Technique

This paper presents the availability and usefulness of a string analysis technique for developing useful rules to determine customers' visiting patterns in sales area. It focuses on stationary states of customers in certain sales areas in a store. We apply a string analysis technique, EBONSAI, to sales area visiting patterns to effectively deal with a huge stream of data. Experiments were conducted to extract useful rules and findings about characteristics of sales area visiting patterns and we discuss problems remaining in existing string analysis techniques.

Key words: Supermarket, marketing, RFID, string analysis technique, EBONSAI.

## Channel Estimation and Code Word Inference for Mobile Digital Satellite Broadcasting Reception

This paper proposes a method to improve the mobile reception quality of digital satellite broadcasting. In the method we describe the channel with a regression model and apply the stochastic inference method of the code words based on the channel model. The proposed method consists of parameter estimation methods of the channel model and stochastic inference methods. For each of the channel estimation and the stochastic inference, two methods are proposed. The maximum likelihood estimation (MLE) and the higher order statistics (HOS) matching methods are proposed for the estimation methods. As for the stochastic inference, we propose the marginal and joint probability inference methods. The improvements are confirmed through experiments with the measured data. The computational costs are also discussed for the future implementation.

Key words: Channel model, channel estimation, stochastic inference of code word.

## Merging Particle Filter and Its Characteristics

A significant problem with the basic particle filter algorithm is degeneration. The merging particle filter algorithm has recently been proposed to overcome this problem at a reasonable computational cost. In an MPF, each member of a filtered ensemble is generated from a weighted sum of multiple samples from the forecast ensemble such that the mean and covariance of the filtered distribution are approximately preserved. In this study, we performed data assimilation experiments using an MPF with two different sets of merging weights. When one merging weight is set to near 1 and the other weights are set small, better estimates are obtained for data assimilation into a low dimensional model. For data assimilation into a relatively large dimensional model, such a weight set requires a large ensemble size.

Key words: Data assimilation, particle filter, merging particle filter.

## Quantifying Smallpox Transmission Using Historical Data: A Database for Statistical Modeling

Although smallpox is the only disease to have been eradicated worldwide, the threat of bioterrorism has led to debate on potential countermeasures in the event of an attack. Because of its global eradication, we have to use historical records to quantify important biological and epidemiologic characteristics in order to optimize intervention. This article reports quantitative modeling of the transmission and spread of smallpox using historical data. In particular, technical issues on database construction specifically aimed at statistical modeling are summarized. As typical examples, I briefly discuss how smallpox spreads in the absence of intervention, focusing on the optimization of quarantine and isolation and on disaster size estimation in the event of an attack. Critically important aspects in extracting key information from the database of smallpox epidemics are summarized, and I mention potential pitfalls in utilizing the historical data.

Key words: Historical data, database, smallpox, epidemiology, model, bioterrorism.

## On the Use of the Concept of Plausibility

The comparison and selection of models was enhanced by the use of log likelihood as the measure of the goodness of a statistical model based on an observation. However, conventional use of statistical models is limited to the application of known structures to the given data, and the discussion of the composition of new models is not developed yet. It is necessary to discuss the methodology of developing verbally defined models for the development of statistical reasoning as a method of general scientific reasoning.

In this paper the necessity of the construction of models by language and the application of the concept of plausibility for the evaluation of the resulting models are illustrated by a concrete example of the analysis of golf swing motion.

Key words: Likelihood, plausibility, golf swing motion.