Page 3 of 3

Re: Data Repository

PostPosted: Wed Dec 03, 2014 2:22 pm
by meninonas
Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard, which is in the second tab of excel, simply states the presence of a relationship, not the direction.

Re: Data Repository

PostPosted: Wed Dec 03, 2014 2:23 pm
by meninonas
Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.

Re: Data Repository

PostPosted: Fri Dec 19, 2014 12:36 pm
by meninonas
Professor,

I have run the Naive Bayes Classifier on the FiveGene_qPCR and below are the results:

Code: Select all
 > bytimepoint<- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X2
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X3
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X4
Y   [,1] [,2]
  0    0    0
  1    1    0

   X._X5
Y   [,1] [,2]
  0    0    0
  1    1    0


When I change the outcome to to @_X2 or @_X3, it gave me the same results. I don't believe that these results are right due to how the probabilities are distributed within each contingency table. Consequently, I changed the coding of the excel sheet to where I would take the mean of each of the 5 genes, where anything larger than the mean was coded as 1 and anything less than the mean was coded as zero. The 1st code has X1 as the outcome variable. See results below:

Code: Select all
 > bymeans <- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
  0   1
0.5 0.5

Conditional probabilities:
   X._X2
Y        [,1]      [,2]
  0 0.2307692 0.4385290
  1 0.6923077 0.4803845

   X._X3
Y        [,1]      [,2]
  0 0.3846154 0.5063697
  1 0.7692308 0.4385290

   X._X4
Y        [,1]      [,2]
  0 0.3076923 0.4803845
  1 0.6153846 0.5063697

   X._X5
Y         [,1]      [,2]
  0 0.07692308 0.2773501
  1 0.69230769 0.4803845


The code below has x2 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X2 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.2857143 0.4688072
  1 0.7500000 0.4522670

   X._X3
Y        [,1]      [,2]
  0 0.5714286 0.5135526
  1 0.5833333 0.5149287

   X._X4
Y        [,1]      [,2]
  0 0.4285714 0.5135526
  1 0.5000000 0.5222330

   X._X5
Y        [,1]      [,2]
  0 0.1428571 0.3631365
  1 0.6666667 0.4923660


The code below has x3 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X3 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.4230769 0.5769231

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.2727273 0.4670994
  1 0.6666667 0.4879500

   X._X2
Y        [,1]      [,2]
  0 0.4545455 0.5222330
  1 0.4666667 0.5163978

   X._X4
Y        [,1]      [,2]
  0 0.1818182 0.4045199
  1 0.6666667 0.4879500

   X._X5
Y        [,1]      [,2]
  0 0.1818182 0.4045199
  1 0.5333333 0.5163978


The code below has x4 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X4 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.5384615 0.4615385

Conditional probabilities:
   X._X1
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.6666667 0.4923660

   X._X2
Y        [,1]      [,2]
  0 0.4285714 0.5135526
  1 0.5000000 0.5222330

   X._X3
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.8333333 0.3892495

   X._X5
Y        [,1]      [,2]
  0 0.3571429 0.4972452
  1 0.4166667 0.5149287


The code below has x5 as the outcome variable:

Code: Select all
 > bymeans <- naiveBayes(X._X5 ~., mydata, laplace = 0, subset, na.action = na.pass)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace, ..1)

A-priori probabilities:
Y
        0         1
0.6153846 0.3846154

Conditional probabilities:
   X._X1
Y   [,1]      [,2]
  0 0.25 0.4472136
  1 0.90 0.3162278

   X._X2
Y   [,1]      [,2]
  0 0.25 0.4472136
  1 0.80 0.4216370

   X._X3
Y     [,1]      [,2]
  0 0.4375 0.5123475
  1 0.8000 0.4216370

   X._X4
Y     [,1]      [,2]
  0 0.4375 0.5123475
  1 0.5000 0.5270463


P.S. I have attached the excel sheet and the two text files with the two types of coding.

Re: Data Repository

PostPosted: Mon Dec 22, 2014 5:27 pm
by meninonas
Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.

Re: Data Repository

PostPosted: Mon Dec 22, 2014 10:32 pm
by cwyoo
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.

Re: Data Repository

PostPosted: Fri Dec 26, 2014 5:23 pm
by meninonas
cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.


Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.

Re: Data Repository

PostPosted: Sat Dec 27, 2014 8:38 pm
by cwyoo
meninonas wrote:
cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.


Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.


Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.


You can categorize the continuous variables.

Re: Data Repository

PostPosted: Tue Jan 06, 2015 1:31 pm
by meninonas
You can categorize the continuous variables.


Dr. Yoo,

What I did was that I dichotomized the data by calculating the mean for each gene and making any number larger than the mean equal to 1 and anything smaller than the mean equal to zero. That's what I based the prediction file and the ROC Curve on.

Would you like for me to do this on all 58 datasets?