GEO datasets

Re: GEO datasets

Postby lsand039 » Mon Apr 04, 2016 12:50 pm

So I finally had the chance to run the gene-probe_id.py script with GSE29378.csv and Probe IDs and Gene Names.csv, but I'm getting a blank output. Attached are the files I used. Do I need to change the layout of GSE29378 to get results?
Attachments
Probe IDs and Gene Names.csv
(1.36 MiB) Downloaded 176 times
GSE29378.csv
(35.4 MiB) Downloaded 164 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Apr 07, 2016 12:56 pm

I just finished matching the probe IDs and gene names to GSE29378 using the script after I figured out what I was doing wrong. I made a new key of gene names and probe IDs that's more comprehensive than the first one I posted. The other gene/ probe ID names were taken directly from the original GSE files and added to my original probe ID and gene name key created from the GDS810 & 4136 probe IDs and gene names and Illumina key found on http://www.genomequebec.mcgill.ca/compg ... robes.html. Attached is the new key and the matched file of GSE29378 along with the work done to make sure the genes were correctly matched to the right probe IDs. I still have 2 more files to match, so those should be up soon!
Attachments
key2c.csv
(2.09 MiB) Downloaded 166 times
GSE29378M .xlsx
(83.07 MiB) Downloaded 173 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 13, 2016 12:27 pm

Attached are the final two datasets with their matched genes.
Attachments
GSE63060M.xlsx.tar.gz
(315.71 MiB) Downloaded 187 times
GSE48350M.xlsx.tar.gz
(319.74 MiB) Downloaded 177 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Apr 21, 2016 12:58 pm

Here are the normalized values and correlation calculations for GSE29378 & GSE48350. I had to make separate files for the normalizations & calculations and separate the Alzheimer and control samples from GSE48350 because the files were getting too big to open.

I combined all the correlation values of all the genes from GDS810, GDS4136, GSE29378, and GSE48350 and found 20 genes with the highest correlations that could be found in all the datasets. All samples from these datasets came from brain samples. I did not include GSE63060 yet in the correlation calculations because all those samples got data from blood.

The data from GDS810, GDS4136, GSE29378, and GSE48350 are just about ready to be analyzed by Banjo. I read the user guide, but I have not yet had the chance to try it out.
Attachments
Top Genes.xlsx
compares correlation values of datasets to determine top genes
(7.57 MiB) Downloaded 176 times
CombinedData.xlsx
Contains data of top 20 genes from GDS810, GDS4136, GSE29378, and GSE48350
(116.34 KiB) Downloaded 174 times
CorrelationsGSE48350.xlsx
correlation calculations
(203.57 MiB) Downloaded 177 times
CorrelationsGSE29378.xlsx
correlation calculations
(49.29 MiB) Downloaded 166 times
GSE48350a.xlsx.tar.gz
z-score & calculations for control samples
(257.82 MiB) Downloaded 169 times
GSE48350b.xlsx
z-score & calculations for Alzheimer samples
(147.69 MiB) Downloaded 157 times
GSE29378.xlsx
z-score & calculations
(108.7 MiB) Downloaded 167 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 27, 2016 1:11 pm

So I've made a list of all the genes common in the 4 datasets. I'm now trying to create files of each dataset that only contain the common genes. I think using Steve's script would work, but I'm running into some issues. I tried using CG810.csv as a key to match all the demographic data and expression values of dataset GDS810 to the file Common Genes.csv. When I run the script, all I get is a black output, and I'm not sure why. Is there anything wrong with the format of my files? The script seems to be running fine; I just don't know why it's been giving me a black output.
Attachments
Common Genes.csv
(79.58 KiB) Downloaded 166 times
CG810.csv
(5.29 MiB) Downloaded 173 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue May 03, 2016 9:49 am

Here are the datasets with the matched genes. There are 11,272 common genes in all 4 datasets. In total, there are 377 samples: 154 AD & 223 controlled.
Attachments
Top Genes.xlsx
(12.61 MiB) Downloaded 178 times
CG29378M.csv
(8.5 MiB) Downloaded 170 times
CG810M.csv
(4.23 MiB) Downloaded 161 times
CG48350M.csv
(33.99 MiB) Downloaded 157 times
CG4136M.csv
(4.1 MiB) Downloaded 164 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 04, 2016 10:31 am

Here are the descritized files for the 4 datasets.
Attachments
810D.xlsx
(8.44 MiB) Downloaded 169 times
29378D.xlsx
(16.95 MiB) Downloaded 160 times
4136D.xlsx
(8.18 MiB) Downloaded 164 times
48350D.xlsx
(67.64 MiB) Downloaded 161 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Jun 02, 2016 12:54 pm

I combined the 4 datasets into one file since Banjo wouldn't let me both specify multiple observation files and specify variable names inFile. I wasn't sure how I should discretize ages. I'm still playing around with some settings to convince Banjo that it can analyze all 377 subjects and 11,275 variables, but I'm hoping Jairo will email me back soon to help me with a solution.

If I had less variables, Banjo would let me run the data; I'd just need to know how I should determine which variables to include. Once I can get Banjo to analyze this dataset, I think we'll be ready to move on to the next step.
Attachments
Allfilesc.txt
(8.19 MiB) Downloaded 165 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Jun 06, 2016 2:05 pm

I was able to run Banjo with all the variables & observations by setting the cache to fastLevel1, precomputeLogGamma to no, and Proposer to RandomLocalMove.
Here are the settings and output files of the 1 hr run. I couldn't upload the .svg file that allows me to view the network, and the jpg image won't show any of the nodes. I think that if I reduce the number of variables, I can get a viewable jpg image; I just need to know how to choose which variables I should remove from the search.
Attachments
statictest.settings.txt
(5.82 KiB) Downloaded 164 times
Allfilesc.static.report.2016.06.06.11.21.10.txt
(827.38 KiB) Downloaded 169 times
top.graph.Allfilesc2016.06.06.11.21.10.txt
(313.52 KiB) Downloaded 166 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Jun 09, 2016 9:00 am

I found the correlations of each variable to Alzheimer's. Attached is the file that contains all the variables reorganized them from highest to lowest correlations.

I ran 1 hour Banjo searches for the top 50,100, 250, 500, 1000, 2500, & 5000 variables; and I've included the settings I used for all searches.
Attachments
otopgenes.settings.txt
(5.81 KiB) Downloaded 152 times
CombinedFilesCorrelationsD.xlsx
(51.03 MiB) Downloaded 144 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 1 guest

cron