SMLG (Statistical Machine Learning Group) Discussion Forum

by **meninonas** » Wed Dec 03, 2014 2:22 pm

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard, which is in the second tab of excel, simply states the presence of a relationship, not the direction.

by **meninonas** » Wed Dec 03, 2014 2:23 pm

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 100, they give 46 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.

by **meninonas** » Fri Dec 19, 2014 12:36 pm

Professor,

I have run the Naive Bayes Classifier on the FiveGene_qPCR and below are the results:

Code: Select all: > bytimepoint<- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.5384615 0.4615385 Conditional probabilities: X._X2 Y [,1] [,2] 0 0 0 1 1 0 X._X3 Y [,1] [,2] 0 0 0 1 1 0 X._X4 Y [,1] [,2] 0 0 0 1 1 0 X._X5 Y [,1] [,2] 0 0 0 1 1 0

When I change the outcome to to @_X2 or @_X3, it gave me the same results. I don't believe that these results are right due to how the probabilities are distributed within each contingency table. Consequently, I changed the coding of the excel sheet to where I would take the mean of each of the 5 genes, where anything larger than the mean was coded as 1 and anything less than the mean was coded as zero. The 1st code has X1 as the outcome variable. See results below:

Code: Select all: > bymeans <- naiveBayes(X._X1 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.5 0.5 Conditional probabilities: X._X2 Y [,1] [,2] 0 0.2307692 0.4385290 1 0.6923077 0.4803845 X._X3 Y [,1] [,2] 0 0.3846154 0.5063697 1 0.7692308 0.4385290 X._X4 Y [,1] [,2] 0 0.3076923 0.4803845 1 0.6153846 0.5063697 X._X5 Y [,1] [,2] 0 0.07692308 0.2773501 1 0.69230769 0.4803845

The code below has x2 as the outcome variable:

Code: Select all: > bymeans <- naiveBayes(X._X2 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.5384615 0.4615385 Conditional probabilities: X._X1 Y [,1] [,2] 0 0.2857143 0.4688072 1 0.7500000 0.4522670 X._X3 Y [,1] [,2] 0 0.5714286 0.5135526 1 0.5833333 0.5149287 X._X4 Y [,1] [,2] 0 0.4285714 0.5135526 1 0.5000000 0.5222330 X._X5 Y [,1] [,2] 0 0.1428571 0.3631365 1 0.6666667 0.4923660

The code below has x3 as the outcome variable:

Code: Select all: > bymeans <- naiveBayes(X._X3 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.4230769 0.5769231 Conditional probabilities: X._X1 Y [,1] [,2] 0 0.2727273 0.4670994 1 0.6666667 0.4879500 X._X2 Y [,1] [,2] 0 0.4545455 0.5222330 1 0.4666667 0.5163978 X._X4 Y [,1] [,2] 0 0.1818182 0.4045199 1 0.6666667 0.4879500 X._X5 Y [,1] [,2] 0 0.1818182 0.4045199 1 0.5333333 0.5163978

The code below has x4 as the outcome variable:

Code: Select all: > bymeans <- naiveBayes(X._X4 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.5384615 0.4615385 Conditional probabilities: X._X1 Y [,1] [,2] 0 0.3571429 0.4972452 1 0.6666667 0.4923660 X._X2 Y [,1] [,2] 0 0.4285714 0.5135526 1 0.5000000 0.5222330 X._X3 Y [,1] [,2] 0 0.3571429 0.4972452 1 0.8333333 0.3892495 X._X5 Y [,1] [,2] 0 0.3571429 0.4972452 1 0.4166667 0.5149287

The code below has x5 as the outcome variable:

Code: Select all: > bymeans <- naiveBayes(X._X5 ~., mydata, laplace = 0, subset, na.action = na.pass) Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = X, y = Y, laplace = laplace, ..1) A-priori probabilities: Y 0 1 0.6153846 0.3846154 Conditional probabilities: X._X1 Y [,1] [,2] 0 0.25 0.4472136 1 0.90 0.3162278 X._X2 Y [,1] [,2] 0 0.25 0.4472136 1 0.80 0.4216370 X._X3 Y [,1] [,2] 0 0.4375 0.5123475 1 0.8000 0.4216370 X._X4 Y [,1] [,2] 0 0.4375 0.5123475 1 0.5000 0.5270463

P.S. I have attached the excel sheet and the two text files with the two types of coding.

by **meninonas** » Mon Dec 22, 2014 5:27 pm

Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.

by **cwyoo** » Mon Dec 22, 2014 10:32 pm

meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.

Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.

by **meninonas** » Fri Dec 26, 2014 5:23 pm

cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.

Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.

Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.

by **cwyoo** » Sat Dec 27, 2014 8:38 pm

meninonas wrote:
cwyoo wrote:
meninonas wrote:Dr. Yoo,

I have done the ROC Curve for the FiveGene_qPCR dataset. Please find the result attached. Would you like for me to do this for all 58 datasets?

NOTE: .xdsl extensions are not allowed to be uploaded.

Now xdsl can be uploaded. Please run banjo and use dot to xdsl program if needed to create xdsl file and draw ROC curve. BTW, you need to use the actual variables (x1, x2, ..., x5) not the meta variables (@_x1, @_x2, ..., @x_5) for the analysis.

Dr. Yoo,

i've been playing around with the analysis but I haven't found a place where the actual variable fit in. For what part of the analysis do I need to use the continuous variables (x1,...,x5)? I know that for the creation of the Bayesian Network and for the prediction files, I need the meta variables, which are dichotomous.

You can categorize the continuous variables.

by **meninonas** » Tue Jan 06, 2015 1:31 pm

You can categorize the continuous variables.

Dr. Yoo,

What I did was that I dichotomized the data by calculating the mean for each gene and making any number larger than the mean equal to 1 and anything smaller than the mean equal to zero. That's what I based the prediction file and the ROC Curve on.

Would you like for me to do this on all 58 datasets?

SMLG (Statistical Machine Learning Group) Discussion Forum

Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Re: Data Repository

Who is online