FABIA

Factor Analysis for Bicluster Acquisition (FABIA) is a model-based technique for biclustering, that is clustering rows and columns simultaneously. FABIA is a multiplicative model that assumes realistic non-Gaussian signal distributions with heavy tails. FABIA utilizes well understood model selection techniques like variational approaches and applies the Bayesian framework. The generative framework allows FABIA to determine the information content of each bicluster to separate spurious biclusters from true biclusters. On 100 simulated data sets with known true, artificially implanted biclusters, FABIA clearly outperformed all 11 competitors. FABIA was tested on microarray data sets which known, biological verfified subclusters and performed on average best out of 11 biclustering approaches.

Please cite:

Sepp Hochreiter, Ulrich Bodenhofer, Martin Heusel, Andreas Mayr, Andreas Mitterecker, Adetayo Kasim, Tatsiana Khamiakova, Suzy Van Sanden, Dan Lin, Willem Talloen, Luc Bijnens, Hinrich W.H. Göhlmann, Ziv Shkedy, and Djork-Arné Clevert. FABIA: Factor Analysis for Bicluster Acquisition, Bioinformatics 2010, 26(12):1520-1527, doi:10.1093/bioinformatics/btq227

Download the R-package for Unix (source package):

Download the R-package for Windows 32-bit (i386):

Download the R-package for Windows 64-bit (x64):

Paper, manuals, examples, and documents:

Sources used in our experiments:

R packages that we used:

Data sets and results:




Copy Number Variation (CNV) Analysis with FABIA

We analyzed copy number variations (CNVs) of HapMap individuals with FABIA on the data set of (McCarroll et al. 2008), who determined CNVs of the HapMap individuals of phase II by Affymetrix Genome-Wide Human SNP Arrays 6.0.

HapMap:
The international HapMap project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. The goal of the international HapMap project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared (see http://hapmap.ncbi.nlm.nih.gov/index.html.en).

The four populations that are considered in HapMap phase II:

Copy Number Variations:
A copy number variation (CNV) is a segment of DNA from on kilobase to several megabases in which copy-number differences have been found by comparison of two or more genomes.

CNVs may either be inherited or caused by de novo mutation. They are assumed to be related with diseases like lung cancer, HIV infection, inflammatory autoimmune disorders, autism, schizophrenia, and idiopathic learning disability (see also http://en.wikipedia.org/wiki/Copy_number_variation).

Data preparation:
We first preprocessed the raw data (CEL files) by cn.FARMS with 100kb windows that contained at least two probe sets locations. After excluding the X and the Y chromosome, 2548 locations obtained an INI call (informative call). The cn.FARMS curated values of these 2548 locations across 269 Hapmap individual served as input to FABIA.

Visualization of 3 biclusters that were found by FABIA:


(Click image for a larger view)

Populations groups:
Blue spheres: Yoruba in Ibadan, Nigeria (YRI)
Green spheres: Utah residents with ancestry from Europe (CEU)
Light blue spheres: Japanese in Tokyo, Japan (JPT)
Turquoise spheres: Han Chinese in Beijing, China (CHB)
Labels of population groups: "hapmapID_PopulationGroup"

Copy number variations (CNVs):
Small red spheres: CNVs that are most indicative in biclusters
Light transparent speckles: CNVs that are indicative
Labels of copy number variations: "Chromosome_Position"


(Click image for a larger view)

(Click image for a larger view)

(Click image for a larger view)

(Click image for a larger view)

(Click image for a larger view)

(Click image for a larger view)

Publication:
McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Veitch J, Collins PJ, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones KW, Rava R, Daly MJ, Gabriel SB, Altshuler D. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genetics 2008 Oct; 40(10):1166-74.

HapMap data sources:
Phase III data (including phase II data)
http://hapmap.ncbi.nlm.nih.gov/downloads/raw_data/hapmap3_affy6.0/

Phase II data
http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000049.v1.p1

The data from the individual "NA12236" (female, CEU) was missing.