Read Data.
File Format

The first step for using ArrayCluster is to load data. The current version 1.0 allows tab-delimited or space-delimited text file in a particular format as shown in Figure 1. Any standard spread sheet softwares, such as Microsoft Excel, can be used to create data file in this format.

Current version requires user beforehand to compute the number of genes and the number of samples and then to input them in the first row. For demo.txt shown in Figure 1, the number of gene=2308 and the number of samples = 63. The current version imposes least upper bounds on the readable number of genes and samples by

The user must specify an data name at the first cell in the second row as like "Demo_data" in Figure 1. Each row of data matrix represents expression values of a gene having an identifier specified at the first column, such as "G21652", "G25725" and so on. Each column represents expression values of a tissue sample. The sample names are always specified at the second row e.g. "ex1", "ex2" and so on.


Figure 1: File format of data file (demo.txt)



Usable Characters


ArrayCluster 1.0 imposes conditions on the least upper bound of the number of characters for sample name and gene identifier, as follows:
Additionally, insertion of any spaces in an identifier and a particular kind of characters e.g. *, /, (, ) and so on, should be excluded in advance. If the identifier includes some spaces or aforementioned words, they should be excluded or filled up or replaced with any trivial characters as like -, _, $ and so on.

(Example)

Missing Data


A dataset that users want to analyze may contain some missing values in the data file. ArrayCluster allows presence of missing observations, and automatically imputes them within the mixed factors analysis. Current version regards a loaded gene expression value as a missing value when the loaded value is less than -10000. The empty cells are ignored during the loading process. Therefore, users have to replace all cells corresponding to missing fragments to any values less than the threshold -10000.


Implementation

After clicking "Load / Reload Data " button in Menu bar and select data file appropriately formatted , the following messages will appear in the DOS-prompt:
If not so, any error messages might be emerged in the window.

The loaded data file will be created at \C:ArrayCluster\extents\lunamacplugin\data\ data_read.txt. This folder can also be seen by clicking "Outputs" in "Menu" on the graphical interface. By the inspection of data_read.txt, users can also check whether the loading process has been accurately completed. To close the DOS window, push Enter.