Data Repository

Data Repository

Postby cwyoo » Thu Sep 25, 2014 8:32 am

Luis, please access the data using the following credential:

Team Name: PathOne
Your password is : 86Kw1ZRN.
You will need this password to access the datasets provided for the DREAM challenges found here:
http://wiki.c2b2.columbia.edu/dream/ind ... Challenges.

Download Dream 2 through Dream 5 datasets, gold standard, and related documents. Study the datasets and create spreadsheets that consists with variables on the columns and cases on the rows.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

DREAM 2 Challenge 3

Postby meninonas » Tue Oct 07, 2014 8:57 pm

Dr. Yoo,

For Challenge 3, there are two datasets. The first one, FiveGene_qPCR.xls is explained below:

This dataset contains 5 genes (x1, x2, x3, x4, and x5) that forms the network. The 5 gene network was treated an initial perturbation, and then measurements were collected with a qPCR measurements. The genes were measured twice. The first time (@_X=0) the genes were measured every three hours, which can be seen in the time column (0-42). The second time (@_X=1) the genes were measured every five hours, which can be seen in the time column (0-50).

For the second dataset of Challenge 3, FiveGene_chip.xlsx, it is explained below:

This dataset contains two time series corresponding to two different treatments. The treatments can be seen under @_gene columns with variables 0 and 1. 588 genes were measured, which include the 5 genes in the synthetic network plus genes known in the literature to be regulated by some of these 5 genes.

In regards to the Gold Standard, which is located in the second tab of each excel sheet, there are two tables, both of which are directed. Directed means that the gene in the first column regulates the gene in the second column.

It is worth noting that the Gold Standard is located in the second tab of every excel sheet with the same heading.
Attachments
FiveGene_qPCR.xls
(23 KiB) Downloaded 146 times
FiveGene_chip-3.xlsx
(117.71 KiB) Downloaded 144 times
Last edited by meninonas on Fri Jan 16, 2015 1:12 pm, edited 11 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

DREAM 2 Challenge 4.1

Postby meninonas » Tue Oct 07, 2014 9:01 pm

From Challenge 4, there are 9 datasets as following:

This datasets were produced from a gene network with 50 genes, where the rate of synthesis of the mRNA of each gene is affected by the level of mRNA of other genes.

    InSilico1-heterozygous.xls contains steady state levels for the wild-type and 50 heterozygous knock-down strains for each gene.

    InSilico1-null-mutants.xls contains steady state levels for the wild-type and 50 null mutant strains for each gene.

    InSilico1-trajectories.xls contains time courses (trajectories) of the network recovering from several external perturbations. There are 23 different perturbations and 26 time points for each one

For the heterozygous and null-mutant files, the @_G columns correspond to the 50 measurements that were taken. For the trajectories file, you'll see that 23 measurements were taken 26 times. What this means is that measurement 1 was taken 26 times, measurement 2 was taken 26 times, and so on until measurement 26.

Like in the previous cases, the Gold Standard is located in the second tab.
Attachments
InSilico1-heterozygous-2.xlsx
(101.03 KiB) Downloaded 150 times
InSilico1-trajectories.xlsx
(415.18 KiB) Downloaded 150 times
InSilico1-null-mutants.xlsx
(95.11 KiB) Downloaded 138 times
Last edited by meninonas on Fri Jan 16, 2015 2:13 pm, edited 12 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

DREAM 2 Challenge 4.2

Postby meninonas » Tue Oct 07, 2014 9:04 pm

Dataset InSillico2,

The structure of the dataset for InSilico2 are similar to those of the InSilico1 dataset. However, the data from the InSilico2 network is qualitatively different from the InSilico1 network. The InSilico2 datasets were produced from a gene network with 50 genes, where the rate of synthesis of the mRNA of each gene is affected by the level of mRNA of other genes.

    InSilico2-heterozygous.xls contains steady state levels for the wild-type and 50 heterozygous knock-down strains for each gene.

    InSilico2-null-mutants.xls contains steady state levels for the wild-type and 50 null mutant strains for each gene.

    InSilico2-trajectories.xls contains time courses (trajectories) of the network recovering from several external perturbations. There are 23 different perturbations and 26 time points for each one.

As before, the Gold Standard is in the second tab.

The coding is the same as in Challenge 4.1.
Attachments
InSilico2-heterozygous.xlsx
(88.9 KiB) Downloaded 150 times
InSilico2-null-mutants.xlsx
(88.43 KiB) Downloaded 144 times
InSilico2-trajectories.xlsx
(399.35 KiB) Downloaded 160 times
Last edited by meninonas on Thu Oct 30, 2014 3:35 pm, edited 16 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

DREAM 2 Challenge 4.3

Postby meninonas » Tue Oct 07, 2014 9:04 pm

InSilico3 dataset:

The InSilico3 dataset was produced from a full in-silico biochemical network, that includes 24 metabolites, 23 proteins and 20 genes. The network has transcription, translation, some signaling, and metabolism. Variables are named Mxx for metabolites, Pyy for proteins, and Gzz for mRNA (where xx, yy and zz are numbers between 1 and 24, 23 adn 20 respectively).

    InSilico3-heterozygous.xls contains steady state levels of metabolites, proteins and mRNA for the wild-type and the 20 heterozygous knock-down strains for each gene.

    InSilico3-null-mutants.xls contains steady state levels of metabolites, proteins and mRNA for the wild-type and the 20 null mutant strains for each gene. NOTE: the knockout of G14 was "lethal".

    InSilico3-trajectories.xls contains time courses (trajectories) of the network recovering from several external perturbations. There are 22 different perturbations and 26 time points for each one.

As before the Gold Standard is located in the second tab. In addition, the @_G columns are also very similarly coded as in the 4.1 and 4.2 challenges.
Attachments
InSilico3-heterozygous.xlsx
(109.59 KiB) Downloaded 143 times
InSilico3-null-mutants.xlsx
(108.08 KiB) Downloaded 156 times
InSilico3-trajectories.xlsx
(458.96 KiB) Downloaded 154 times
Last edited by meninonas on Thu Oct 30, 2014 3:38 pm, edited 11 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Challenge 5

Postby meninonas » Tue Oct 07, 2014 9:04 pm

For Challenge 5,

This is the description given,

This challenge dataset consists of two files. One file, data.csv contains the experimental data, and the other, tfs.csv, lists the transcription factors.

    data This file contains a 3456 genes x 300 experiments dataset. The names of both genes and experiments have been withheld, and operon information is not provided. As described above, the experiments represent both published and not-yet-released data from a variety of sources. The 3456 genes include all known and putative transcription factors and all genes whose interactions will be used for testing, as well as a number of other recognized coding sequences. This file is comma-separated and is easily imported into Excel or any other program.

    tfs This file contains the indices of rows belonging to transcription factors in the matrix from data.csv, one per line.

I'm having a hard time understanding the relationship between both datasets and maybe you could help me Dr. Yoo. The main challenge I'm having is that there are 320 rows in tfs, nevertheless, there are 3456 rows and 300 columns in data; consequently, I wouldn't know how to fit them.

I have added the link below:

http://wiki.c2b2.columbia.edu/dream/index.php/D2c5full
Attachments
tfs.xlsx
(31.78 KiB) Downloaded 154 times
data.xlsx
(8.89 MiB) Downloaded 151 times
Last edited by meninonas on Sun Oct 26, 2014 11:19 am, edited 3 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Challenge 3

Postby cwyoo » Tue Oct 28, 2014 7:03 pm

meninonas wrote:Dr. Yoo,

For Challenge 3, there are two datasets. The first one, FiveGene_qPCR.xls is explained below:

This dataset contains two time point, 0 and 1, for the 5 genes (x1, x2, x3, x4, and x5) of the network. In each of the time points, the organism was treated an initial perturbation, and then measurements were collected at different times from qPCR measurements. The two time series correspond to samples taken at regular intervals for 3 hr (0-42) and for 5 hr (0-50).

In addition , an addition tab in the bottom of the excel sheet has been added. In it, you'll find two tables.

    The first table, you'll find the list of pairs of genes stating if they have a relationship (1) or not (0).

    In the second table, you'll find how they influence each other. What this means is that if gene 1 influences gene 5, it'll appear as such, with a 1 representing such relationship. If gene 5 also influence gene 1, it will also appear (row 7) with a one representing such relationship.Consequently, in this example, gene 1 and 5 influence each other.

For the second dataset of DREAM 3, the description is as following:

This dataset contains two time series corresponding to two different treatments. 588 genes from the original Affymetrix microarray data were selected, which include the 5 genes in the synthetic network plus genes known in the literature to be regulated by some of these 5 genes. The 5-gene network is oscillating with the cell cycle.

    For the Gold Standard, it goes the same as above. It can be found in the second tab of the excel sheet. In it, the first table, states the relationships between the genes, while the second table states the directional relationship.

If you have any questions, do not doubt in asking me.


As we discussed, the meta variables should start with "@_"(without the quotation marks). Also, you need to describe what those meta variables represent. For the gold standard, we are interested in DIRECTED-SIGNED_EXCITATORY_FiveGene_qPCR.txt and DIRECTED-SIGNED_INHIBITORY_FiveGene_qPCR.txt. Similarly, for the chip data we will use the two gold standard. Please do a further research and update your post accordingly. Also rename all your title posting to reflect the challenges that you have posted are from Dream 2.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

DREAM 3 Challenge 4

Postby meninonas » Thu Nov 13, 2014 12:59 am

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 10, we are given 4 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.
Attachments
InSilicoSize10-Ecoli1-trajectories.xlsx
(25.88 KiB) Downloaded 136 times
InSilicoSize10-Ecoli1-null-mutants.xlsx
(13.25 KiB) Downloaded 148 times
InSilicoSize10-Ecoli1-heterozygous.xlsx
(13.3 KiB) Downloaded 154 times
Last edited by meninonas on Thu Nov 13, 2014 1:01 am, edited 1 time in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

DREAM 3 Challenge 4

Postby meninonas » Thu Nov 13, 2014 1:00 am

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 10, we are given 4 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.
Attachments
InSilicoSize10-Ecoli2-trajectories.xlsx
(25.76 KiB) Downloaded 158 times
InSilicoSize10-Ecoli2-null-mutants.xlsx
(13.28 KiB) Downloaded 147 times
InSilicoSize10-Ecoli2-heterozygous.xlsx
(13.29 KiB) Downloaded 142 times
Last edited by meninonas on Thu Nov 13, 2014 1:01 am, edited 1 time in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

DREAM 3 Challenge 4

Postby meninonas » Thu Nov 13, 2014 1:01 am

Dr. Yoo,

The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data and to predict the directed unsigned network topology from the given in silico generated gene expression datasets. There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes.

For every network, the following experiments are simulated:

Heterozygous knock-down. The files -heterozygous contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene.

Null-mutants. The files -null-mutants contain the steady state levels for the wild-type and the null-mutant strains for each gene.

Trajectories. The files -trajectories contain time courses of the network recovering from several external perturbations. For the networks of size 10, we are given 4 perturbations (each one with 21 time points).

The Gold Standard simply states the presence of a relationship, not the direction.
Attachments
InSilicoSize10-Yeast1-trajectories.xlsx
(25.75 KiB) Downloaded 150 times
InSilicoSize10-Yeast1-null-mutants.xlsx
(14.26 KiB) Downloaded 154 times
InSilicoSize10-Yeast1-heterozygous.xlsx
(13.28 KiB) Downloaded 150 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Next

Return to DREAM Project

Who is online

Users browsing this forum: No registered users and 1 guest