Data Repository

Re: Data Repository

Postby meninonas » Wed Dec 03, 2014 2:22 pm

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard, which is in the second tab of excel, simply states the presence of a relationship, not the direction.
Attachments
InSilicoSize100-Yeast2-trajectories.xlsx
(1.78 MiB) Downloaded 193 times
InSilicoSize100-Yeast2-null-mutants.xlsx
(322.28 KiB) Downloaded 199 times
InSilicoSize100-Yeast2-heterozygous.xlsx
(322.57 KiB) Downloaded 187 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Data Repository

Postby meninonas » Wed Dec 03, 2014 2:23 pm

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.
Attachments
InSilicoSize100-Yeast3-trajectories.xlsx
(1.78 MiB) Downloaded 183 times
InSilicoSize100-Yeast3-null-mutants.xlsx
(321.9 KiB) Downloaded 179 times
InSilicoSize100-Yeast3-heterozygous.xlsx
(322.45 KiB) Downloaded 180 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Data Repository

Postby meninonas » Fri Dec 19, 2014 12:36 pm

Professor,

I have run the Naive Bayes Classifier on the FiveGene_qPCR and below are the results:

Code: Select all
 > bytimepoint<- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X2
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X3
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X4
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X5
Y   [,1] [,2]
  0    0    0
  1    1    0


When I change the outcome to to @_X2 or @_X3, it gave me the same results. I don't believe that these results are right due to how the probabilities are distributed within each contingency table. Consequently, I changed the coding of the excel sheet to where I would take the mean of each of the 5 genes, where anything larger than the mean was coded as 1 and anything less than the mean was coded as zero. The 1st code has X1 as the outcome variable. See results below:

Code: Select all
 > bymeans <- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
  0   1
0.5 0.5

Conditional probabilities:
   X._X2
Y        [,1]      [,2]
  0 0.2307692 0.4385290
  1 0.6923077 0.4803845

   X._X3
Y        [,1]      [,2]
  0 0.3846154 0.5063697
  1 0.7692308 0.4385290

   X._X4
Y        [,1]      [,2]
  0 0.3076923 0.4803845
  1 0.6153846 0.5063697

   X._X5
Y         [,1]      [,2]
  0 0.07692308 0.2773501
  1 0.69230769 0.4803845


The code below has x2 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X2 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.2857143 0.4688072
  1 0.7500000 0.4522670

   X._X3
Y        [,1]      [,2]
  0 0.5714286 0.5135526
  1 0.5833333 0.5149287

   X._X4
Y        [,1]      [,2]
  0 0.4285714 0.5135526
  1 0.5000000 0.5222330

   X._X5
Y        [,1]      [,2]
  0 0.1428571 0.3631365
  1 0.6666667 0.4923660


The code below has x3 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X3 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.4230769 0.5769231

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.2727273 0.4670994
  1 0.6666667 0.4879500

   X._X2
Y        [,1]      [,2]
  0 0.4545455 0.5222330
  1 0.4666667 0.5163978

   X._X4
Y        [,1]      [,2]
  0 0.1818182 0.4045199
  1 0.6666667 0.4879500

   X._X5
Y        [,1]      [,2]
  0 0.1818182 0.4045199
  1 0.5333333 0.5163978


The code below has x4 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X4 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.6666667 0.4923660

   X._X2
Y        [,1]      [,2]
  0 0.4285714 0.5135526
  1 0.5000000 0.5222330

   X._X3
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.8333333 0.3892495

   X._X5
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.4166667 0.5149287


The code below has x5 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X5 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.6153846 0.3846154

Conditional probabilities:
   X._X1
Y   [,1]      [,2]
  0 0.25 0.4472136
  1 0.90 0.3162278

   X._X2
Y   [,1]      [,2]
  0 0.25 0.4472136
  1 0.80 0.4216370

   X._X3
Y     [,1]      [,2]
  0 0.4375 0.5123475
  1 0.8000 0.4216370

   X._X4
Y     [,1]      [,2]
  0 0.4375 0.5123475
  1 0.5000 0.5270463


P.S. I have attached the excel sheet and the two text files with the two types of coding.
Attachments
FiveGene_qPCR.xls
(55.5 KiB) Downloaded 182 times
FiveGene_qPCR by Means.txt
(284 Bytes) Downloaded 195 times
FiveGene_qPCR by Time Point.txt
(284 Bytes) Downloaded 193 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Data Repository

Postby meninonas » Mon Dec 22, 2014 5:27 pm

Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.
Attachments
ROC Curve.pdf
(97.7 KiB) Downloaded 173 times
FiveGene.csv.pred.csv
(520 Bytes) Downloaded 194 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Data Repository

Postby cwyoo » Mon Dec 22, 2014 10:32 pm

meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.
cwyoo
Site Admin
 
Posts: 378
Joined: Sun Jun 22, 2014 2:38 pm

Re: Data Repository

Postby meninonas » Fri Dec 26, 2014 5:23 pm

cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.


Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Data Repository

Postby cwyoo » Sat Dec 27, 2014 8:38 pm

meninonas wrote:
cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.


Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.


You can categorize the continuous variables.
cwyoo
Site Admin
 
Posts: 378
Joined: Sun Jun 22, 2014 2:38 pm

Re: Data Repository

Postby meninonas » Tue Jan 06, 2015 1:31 pm

You can categorize the continuous variables.


Dr. Yoo,

What I did was that I dichotomized the data by calculating the mean for each gene and making any number larger than the mean equal to 1 and anything smaller than the mean equal to zero. That's what I based the prediction file and the ROC Curve on.

Would you like for me to do this on all 58 datasets?
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Previous

Return to DREAM Project

Who is online

Users browsing this forum: No registered users and 1 guest

cron