SMLG (Statistical Machine Learning Group) Discussion Forum

Posted: **Sun Sep 17, 2017 10:42 pm**

One really good way to get TCGA glioblastoma data or any TCGA data for that matter, is through cBioportal. In the case of glioblastoma, there is data on 580 patients, published in Cell in 2013. The cBioportal site gives a good breakup of the kinds of data available, as well as excellent visualization tools for preliminary analysis.

Of the 580 patients, there is RNASeq data for 154 of them. The RNASeq expression data has been cleaned and presented as single values per gene, so it is ready to use in an Excel or MS Access format. The data downloads as a compressed tar file which when uncompressed, is a folder of .txt files totaling about 250 MB total size and can be opened on the Path4 server in order to convert the .txt files into Excel files (through OfficeLibre) relatively quickly.

In order to obtain the associated raw, unprocessed data, one can got to the NCI Genomic Data Commons (GDC) website and find it through matching the patient IDs. Although the raw expression values are openly accessible via GDC, access to the raw sequence reads is controlled and perhaps might be unlocked for us if we were to get a special permit as an educational institution.

Posted: **Tue Jan 23, 2018 4:59 pm**

Attached is a file showing the correlation of 19,739 genes with patient overall survival in months (OS). As can be seen from the graph, there are no genes showing up as correlating with OS with a Pearson coefficient higher than 0.45, meaning OS is not significantly linearly associated with any of the 19,739 genes analyzed by RNASeq.

Posted: **Tue Jan 23, 2018 5:33 pm**

Overall Analysis strategy taken thus far:
-Took OS deceased status patients (97 patients), 153 genes and 5 clinical variables, and ran Banjo analysis for 36h X3 to determine genes in 1st degree Markov blanket of OS (mo).
-Took OS deceased status low OS (<6mo) group-38patients and high OS (>1yr) group-39patients, 153 genes and 5 clinical variables in each case, and ran Banjo analysis for 1h, 2h, 8h, 36h each and used log normalization in each case to determine probability of the highest BDe scoring network being the one that fits the data best among all the networks scanned thus far.
-Took 1st degree Markov blanket genes and re-ran analysis to determine reproducibility of patterns observed.

***** Need help to do the power calculations for this particular analysis.
Also need advice on how best to take this forward from this point on in order to start writing up the results as an actual publication.

Posted: **Thu Feb 01, 2018 5:06 pm**

Data on all 154 patients - RNASeqv2RSEM and Z scores

Posted: **Thu Feb 01, 2018 5:22 pm**

This week I am focusing on comparing the graph structures of the consensus graphs and top graphs from the triplicate 36h Banjo runs for the two groups.

Posted: **Thu Feb 01, 2018 5:34 pm**

On the source data - TCGA info page snapshot

Posted: **Thu Feb 01, 2018 5:41 pm**

Attached is what has already been done by others previously around long term survivors (>3yrs) molecular classification using the dataset I am looking at...

Posted: **Fri Feb 09, 2018 1:18 pm**

This week I am focusing on determining the 1st degree Markov blanket genes for OS in the combined run without the middle group of patients and also designing follow up runs based on the fact that we see some reproducible gene networks from the earlier runs.

SMLG (Statistical Machine Learning Group) Discussion Forum

Getting TCGA Glioblastoma data through cBioportal

Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal

Re: Getting TCGA Glioblastoma data through cBioportal