Prediction and Knowledge Discovery
06 Statistical methods for finding disease-related genes Project Leader
Hironori Fujisawa

Genome data have been accumulated for years. We expect that there exist relations between genome data and diseases (or effects of drug). The information of the relations will be useful for improving the remedy. In this study, many statistical methods have been used for predicting effects of drug, finding disease-related genes, validation of remedy, and so on. Simultaneously, many problems of existing statistical methods have been presented. Our group is developing new statistical methods for overcoming such problems. In the following, some new methods we have constructed are introduced.

[Haplotype block partitioning]
Look at the left-hand side of Fig. 1. We observe five haplotypes. The first two haplotypes had large frequencies and the last two haplotypes had small frequencies. Look at the right-hand side of Fig. 1. The last two haplotypes can be regarded as recombinants of the first three haplotypes. Such a structure is called a haplotype block structure. The use of haplotype block structure has a possibility of increasing a power of finding a disease-related gene. Our group has developed a statistical method for identifying a haplotype block structure. The software will be distributed in future. The constructed method was applied to SNP data gathered in JFCR. The result obtained gave a clearer explanation on the opinion of JFCR (Fig. 2).

[GroupAdaBoost]
There are many studies of finding disease-related genes, based on microarray data. A characteristic of the data is that the number of genes is much larger than the number of patients. As a consequence, standard statistical methods often fail to correctly analyze the data because such a situation is a famous p>>n problem. Note that some genes are similar in view of behavior on microarray data, so that there are some gene groups consisting of similar genes. Our group has developed a modification of AdaBoost automatically identifying gene groups (Fig. 3).

[Protein detection and BridgeBoost]
There are two promising studies. One is a study of detecting proteins from TOF/MF data (Fig. 4). The other is a study of unifying microarray data gathered from some laboratories.


Members
Hironori Fujisawa (ISM)
Shinto Eguchi (ISM)
Takashi Takenouchi (ISM)
Satoshi Kuriki (ISM)
Mihoko Minami (ISM)
Tadayoshi Fushiki (ISM)
Masanori Kawakita (ISM)
Masaaki Matsuura (JFCR)
Satoshi Miyata (JFCR)
Masaru Ushijima (JFCR)
Minoru Isomura (JFCR)
Yumi Enomoto (JFCR)

Fig.1


Five haplotypes before and after analysis. On the right-hand side, the last two haplotypes are regarded as recombinants of the first three haplotypes.(Data: Genome Center, Japanese Foundation for Cancer Research)

Fig.2

Haplotype block structure on three gene regions. Real lines indicate haplotype blocks on dashed lines. (Data: Genome Center, Japanese Foundation for Cancer Research)
Fig.3

Flow chart of AdaBoost and GroupAdaBoost. The black is the flow chart of AdaBoost and the red indicates improved parts on GroupAdaBoost, respectively.
Fig.4

Protein detection from TOF/MF data.(Data: Genome Center, Japanese Foundation for Cancer Research)
| 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | Category Index