Data Preprocessing

ArrayCluster provides the user a number of available options for data preprocessing including gene filtering and data normalization. In sequel, the details will be described.

Gene Filtering

Although microarray dataset contains a large number of genes, a part of genes are typically excluded during the expression profiling. This process, i.e. Gene Filtering, is aimed at removing the undesirable-genes that contain outliers and too much missing expression values, and that do not exhibit variability across tissue samples. To this end, the ArrayCluster offers a number of options. Tracing from "Preprocessing" to "Gene Filtering", the following questions will be asked (as shown in Figure 1):

After checking the option dialogs and clicking "Finish" button, gene_filtering.exe will run on. If the computations are completed, the following messages will appear in the DOS-prompt:
If not so, any error messages might be emerged in this window.

List of the selected gene filtering options can be checked through the process viewer. In this example, missing cut with 1% and the largest deviance 500 genes selection were chosen. The number of selected genes, in this case, 500, is also noted. If users would like to reduce or increase the number of genes involved in the following profiles, turn to reload data file step and again proceed to the gene filtering process.

The filtered data file will be created at \C:ArrayCluster\extents\LunamacPlugin\data\ data_read.txt. By the inspection, user can also check whether the filtering process has been accurately completed. To close the DOS window, push Enter key.


Figure 1: Gene filtering wizard



Normalization


DNA microarray experiments often have systematic bias, for instance, heterogeneity in the scaling and origin, due to the experimental design. These biases should be possibly adjusted before proceeding to the downstream analysis. In microarray studies, this step is referred to as the Normalization.

ArrayCluster 1.0 offers some available normalization methods. Proceeding to "Normalization" menu, then user can see the normalization wizard (Figure 2) in which the following options are listed:

After checking any dialogs and clicking "Finish", normalization.exe will run on. If the process is completed, the following messages will appear in the DOS-prompt window:


If not so, some errors are alerted by this window. For example, augment of the logarithmic function should be a positive real value. If user selected Log Transformation option in spite of that the loaded data contains any negative gene expression values, the following error messages will appear in the DOS-window: Normalized Values can also be checked at \C:ArrayCluster\extents\LunamacPlugin\data\ data_read.txt. If user does not confirm the implemented normalization methods, return to Load Data and then proceed to the normalization step, again.

Figure 2: Normalization wizard