extract_plot {fabia} | R Documentation |
extract_plot
: R implementation of extract_plot
.
extract_plot(X,L,Z,thresZ=0.5,ti,thresL=NULL,Y=NULL,x11b=TRUE,norm=1)
X |
original data matrix. |
L |
loading, left matrix. |
Z |
factor, right matrix. |
thresZ |
threshold for sample belonging to bicluster (default 0.5). |
thresL |
threshold for loading belonging to bicluster (estimated if not given). |
ti |
plot title. |
Y |
noise free data matrix. |
x11b |
plot on screen. |
norm |
should the data be standardized, default = 1 (yes, using mean), 2 (yes, using median). |
Essentially the model is the sum of outer products of vectors. The number of summands p is the number of biclusters.
X = L Z + U
X = sum_{i=1}^{p} L_i (Z_i )^T + U
The hidden dimension p is used for kmeans clustering of L_i and Z_i .
The L_i and Z_i are used to extract the bicluster i, where a threshold determines which observations and which samples belong the the bicluster.
The method produces a couple of plots given below.
Plots:
“Y”: noise free data (if available),
“X”: data,
“LZ”: reconstructed data,
“LZ-X”: error,
“abs(Z)”: absolute factors,
“abs(L)”: absolute loadings,
“abs(nL)”: absolute loadings normalized,
“abs(nZ)”: absolute factors normalized,
“nZ*pmZ”: factors sorted,
“pmL*nL”: loadings sorted,
“pmL*L*z*pmZ”: reconstructed matrix sorted,
“pmL*X*pmZ”: original matrix sorted.
In above plots the matrix L and the matrix
Z are sorted. For sorting first kmeans
is
on the p dimensional space is performed and then the vectors
which belong to the same cluster are put together in the sorting.
This sorting is made for visualization but in general
it is not possible to
visualize all biclusters as blocks if they overlap.
In bic
the biclusters are extracted according to the
largest absolute values of the component i, i.e.
the largest values of L_i and the
largest values of Z_i . The factors Z_i
are normalized to variance 1.
The components of bic
are
bin
, bixv
,
bixn
, biypv
, biypn
, biynv
,
and biynn
.
bin
gives the size of the bicluster: number observations,
number positive samples, number negative samples.
bixv
gives the values of the observations that have absolute
values above a threshold. They are sorted and
bixn
gives their names (e.g. gene names).
biypv
gives the values of the samples that have
values above a threshold. They are sorted and
biypn
gives their names (e.g. sample names).
biynv
gives the values of the samples that have
values below this threshold. They are sorted and
biynn
gives their names (e.g. sample names).
That means the samples are divided into two groups where one group shows large positive values and the other group has negative values with large absolute values. That means a observation pattern can be switched on or switched off relative to the average value.
numn
gives the indexes of bic
with components:
numn1
= bix
,numn2
= biyp
, and
numn3
= biyn
.
The kmeans clusters are given by biclust
with
components biclustx
(the clustered observations)
and biclusty
(the clustered samples).
Implementation in R.
bic |
extracted biclusters. |
numn |
indexes for the extracted biclusters. |
biclust |
clusters of kmeans clustering. |
pmZ |
permutation matrix of z from kmeans clustering. |
pmL |
permutation matrix of Lambda from kmeans clustering. |
nL |
normalized loadings (left matrix). |
nZ |
normalized factors (right matrix). |
Xord |
sorted original matrix according to kmeans on Z and kmeans on Lambda. |
Sepp Hochreiter
fabi
,
fabia
,
fabiap
,
fabias
,
fabiasp
,
mfsc
,
nmfdiv
,
nmfeu
,
nmfsc
,
nprojfunc
,
projfunc
,
make_fabi_data
,
make_fabi_data_blocks
,
make_fabi_data_pos
,
make_fabi_data_blocks_pos
,
extract_bic
,
myImagePlot
,
PlotBicluster
,
Breast_A
,
DLBCL_B
,
Multi_A
,
fabiaDemo
,
fabiaVersion
#--------------- # TEST #--------------- dat <- make_fabi_data_blocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5, of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0, sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0) X <- dat[[1]] Y <- dat[[2]] X <- X- rowMeans(X) XX <- (1/ncol(X))*tcrossprod(X) dXX <- 1/sqrt(diag(XX)+0.001*as.vector(rep(1,nrow(X)))) X <- dXX*X resEx <- fabia(X,20,0.3,1.0,1.0,3) rEx <- extract_plot(X,resEx$L,resEx$Z,ti="FABIA",Y=Y,x11b=FALSE) rEx$bic[1,] rEx$bic[2,] rEx$bic[3,] rEx$biclust[1,] rEx$biclust[2,] rEx$biclust[3,] ## Not run: #--------------- # DEMO1 #--------------- dat <- make_fabi_data_blocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5, of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0, sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0) X <- dat[[1]] Y <- dat[[2]] resToy <- fabia(X,200,0.4,1.0,1.0,13) rToy <- extract_plot(X,resToy$L,resToy$Z,ti="FABIA",Y=Y) #--------------- # DEMO2 #--------------- data(Breast_A) X <- as.matrix(XBreast) resBreast <- fabia(X,200,0.1,1.0,1.0,5) rBreast <- extract_plot(X,resBreast$L,resBreast$Z,ti="FABIA Breast cancer(Veer)") #sorting of predefined labels CBreast ## End(Not run)