Overview
Microarray dataset is a series of microarray experiments, in which each
experiment represents a gene expression signature of a tissue sample. Typical
dataset has a fairly small sample size, usually less than one hundred,
whereas the number of genes involved is more than several thousands. Cluster
analysis of microarray gene expression data is aimed at finding subclasses
of disease at the molecular level. In view of statistics, one major difficulty
in this problem is that the number of samples to be clustered is much smaller
than the dimension of data which is equal to the number of genes involved
in an experiment. Under such a situation, model-based clustering according
to a conventional finite mixture model might fail due to the occurrence
of overfitting during the density estimation process. The mixed factors
analysis was originally developed to overcome such difficulty in microarray
gene expression profilings.
The mixed factors model presents a parsimonious parameterization of Gaussian
mixture model. Consequently, the method enables us to avoid the occurrence
of overfitting even when the dimension of data is more than several thousands!
The method contains the following applications:
- Clustering microarray experiments
- Data visualization via the built-in dimension reduction system
- Identification of module transcriptional genes that are relevant to the
calibrated clusters
- Determination of an appropriate number of clusters
- Determination of the number of module transcriptionals
- Missing data imputation
For a belief introduction of the mixed factors analysis, see What's the Mixed Factors Analysis ?. For more details, see our papers, e.g. Yoshida et al. (2004).
Figure 1: Snapshot of graphical user interface of ArrayCluster
This work was carried out in the laboratory of Tomoyuki Higuchi, The Institute of Statistical Mathematics, Research Organization of Information and Systems, and the laboratory of
Satoru Miyano, DNA Information Analysis, Human Genome Center, Institute
of Medical Science, University of Tokyo.
Developers; Tomoyuki Higuchi, Ryo Yoshida, Seiya Imoto, Satoru Miyano
Copyright; (C) 2005- Tomoyuki Higuchi (C) 2005- Ryo Yoshida (C) 2005- Seiya Imoto (C) 2005-
Satoru Miyano (C) 2005- The Institute of Statistical Mathematics, Research Organization of Information and Systems (C) 2005- Human Genome
Center, Institute of Medical Science, University of Tokyo