GEO datasets

Re: GEO datasets

Postby lsand039 » Mon Apr 17, 2017 3:05 pm

A continuation of the last post
Attachments
MinADriskAge<65.png
MinADriskAge<65.png (26.85 KiB) Viewed 38948 times
MaxADriskAge<65.png
MaxADriskAge<65.png (23.51 KiB) Viewed 38948 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Apr 18, 2017 10:09 am

The files below have scores on GeNIe and BaNJO that have scores we expect (the GMLS derived from Bene has a better score than the CS) and have parameters that match the data.

Here are their significance values:
Banjo:
CS: -10836.4009
GMLS: -10827.6995
min: 0.012897781
max: 0.113568398

Genie
CS: -8517.694735
GMLS: -8493.975566
min: 7.07E-06
max: 0.002659
Attachments
Bene.graph.2017.04.05.17.09.00.png
GMLS-BaNJO image
Bene.graph.2017.04.05.17.09.00.png (128.57 KiB) Viewed 38947 times
CS.graph.2017.04.04.15.20.19.png
CS-BaNJO image
CS.graph.2017.04.04.15.20.19.png (105.14 KiB) Viewed 38947 times
FulldataLP1SS1753NBDRALPC001.xdsl
CS- GeNIe file
(34.87 KiB) Downloaded 156 times
GMLSLPFulldataC001.xdsl
GMLS- GeNIe file
(33.97 KiB) Downloaded 154 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Apr 18, 2017 11:40 am

I went to validate the GMLS & CS files previously posted since they have the correct parameters on GeNIe. I'm getting the same Accuracy, ROC, and prediction values from both of them using the Leave One Out test. The only difference is that now some of the AD/Non AD prediction values on the GMLS and CS files actually matche with the prediction values in the validation output files.t I'm still not sure why not all the AD/Non AD prediction values match with the prediction values in its output files are matching on both the CS and GMLS .

Questions I still have:
How is GeNIe scoring and predicting structures?
Why are only some of the prediction values from the validation file matching the structure prediction values for the GMLS & CS files?
Attachments
CSLOO.txt
Leave One Out Validation output for CS
(21.16 KiB) Downloaded 164 times
GMLSLOO.txt
Leave One Out Validation output for GMLS
(21.16 KiB) Downloaded 165 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Apr 18, 2017 10:26 pm

I tried to change the ESS on BaNJO from the default value of 1.0 to 0.001 to match what I've been using on GeNIe. Unfortunately, I don't think I can specify anything lower than 1.0. Attached are the setting files and output summary files from my attempt.
Attachments
CS.static.report.2017.04.18.22.16.50.txt
CS results summary
(13.52 KiB) Downloaded 159 times
Bene.static.report.2017.04.18.22.17.16.txt
GMLS results summary
(11.75 KiB) Downloaded 150 times
benesetting.txt
GMLS setting
(5.88 KiB) Downloaded 148 times
CSsetting.txt
(5.88 KiB) Downloaded 138 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 19, 2017 11:08 am

Here are the dot structures for the GMLS & CS. The thickness of the arcs correspond with the magnitude of the influence score. Influence scores can be found on the results summary of the previous post
blue with arrow: positive influence score
red with perpendicular end: negative influence score.
black: influence score of 0

Please let me know if they are difficult to read. I had to play around with the thickness of arcs so all of them could be visible and not overly obnoxious.
Attachments
GMLS.dot.1+absx15.png
GMLS.dot.1+absx15.png (114.36 KiB) Viewed 38945 times
CS.dot.1+absx15.png
CS.dot.1+absx15.png (91.34 KiB) Viewed 38945 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 11, 2017 12:09 pm

I found that out of the 12 datasets we have, there were only 8257 genes in common. I removed the extra genes from previous post that contained the data for what I though were 8286 common genes Attached are the datasets with those 8257 common genes and the gene expression levels already discretized.

The data from GSE48350 with the 8257 common genes are not up yet since I need Access or Base to find the common genes in this dataset. I'll be posting the input files for BaNJO that has Age, Sex, AD, and Brain Region discretized next.
Attachments
29378D.xlsx
(9.46 MiB) Downloaded 143 times
28146D.xlsx
(4.46 MiB) Downloaded 146 times
16759D.xlsx
(455.14 KiB) Downloaded 139 times
15222D.xlsx
(7.87 MiB) Downloaded 160 times
26927D.xlsx
(490.77 KiB) Downloaded 148 times
44768D.xlsx
(5.14 MiB) Downloaded 166 times
44770D.xlsx
(5.14 MiB) Downloaded 161 times
44771D.xlsx
(5.16 MiB) Downloaded 154 times
84422_570D.xlsx
(2.34 MiB) Downloaded 142 times
84422_96D.xlsx
(23.33 MiB) Downloaded 151 times
Last edited by lsand039 on Wed May 17, 2017 1:04 pm, edited 3 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 11, 2017 12:10 pm

A continuation of the last post.
Attachments
1297D.xlsx
(5.25 MiB) Downloaded 142 times
28146Z.xlsx
Z-scores for GSE28146
(8.44 MiB) Downloaded 143 times
48350D.xlsx
(44.21 MiB) Downloaded 180 times
48350Z.xlsx
Z-scores for GSE48350
(108.3 MiB) Downloaded 158 times
5281D.xlsx
(3.39 MiB) Downloaded 153 times
Last edited by lsand039 on Wed May 17, 2017 12:14 pm, edited 3 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 11, 2017 5:29 pm

Here I've discretized the age, sex, brain region, and Alzheimer's status of the samples.
Age>65=1, Age<65=0
Hippocampus=1, Non-hippocampus=0
Female=1, Male=0
Alzheimer=1, Control/Non-Alzheimer=0

Sheet3 for 84422 contains only the samples that were definitely AD or Normal.
Attachments
29378bp.xlsx
(2.97 MiB) Downloaded 162 times
28146bp.xlsx
(1.44 MiB) Downloaded 142 times
44771bp.xlsx
(10.29 MiB) Downloaded 138 times
16759bp.xlsx
(713.06 KiB) Downloaded 138 times
15222bp.xlsx
(15.7 MiB) Downloaded 146 times
26927bp.xlsx
(950.46 KiB) Downloaded 148 times
44768bp.xlsx
(10.24 MiB) Downloaded 130 times
44770bp.xlsx
(10.25 MiB) Downloaded 121 times
84422_96bp.xlsx
(58.49 MiB) Downloaded 116 times
84422_570bp.xlsx
(6.09 MiB) Downloaded 114 times
Last edited by lsand039 on Wed May 17, 2017 1:03 pm, edited 4 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 11, 2017 5:30 pm

Continued from las post
Attachments
1297bp.xlsx
(1.5 MiB) Downloaded 115 times
5281bp.xlsx
(6.75 MiB) Downloaded 103 times
Last edited by lsand039 on Wed May 17, 2017 12:15 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri May 12, 2017 2:56 pm

Here is a table of the 12 datasets I plan to be using. They have 8257 genes in common.
Dataset Summary.png
Dataset Summary.png (44.77 KiB) Viewed 38941 times


The GSE # refers to their GEO accession number. GSE84422 used two platforms, GPL96 and GPL570. I only counted the samples that were definitively AD and controls. I still need to clean up GSE48350 using Base/ Access, but right now I'm having issues opening the file on either of those.

To find out how much of each data set was included in the list of common genes, I went to the list of genes in the GPL file. The column labled "Original # of genes in GPL" refers to the number of genes I found in the GPL file. The number within the parentheses is the GPL#.

Not all the genes in the GPL file are always shown in the GSE dataset. Because multiple probe IDs can match with the same gene, I couldn't directly determine how many genes were available in each dataset. I could find out using Base or Access, but I'm running into a couple issues. Base needs Java Runtime Environment which doesn't seem to be installed in Path 3 and maybe Path 5. Java Runtime Environment is installed in Path 4, but Base keeps freezing up. I think it's because of the size of the files I'm using.

I think Access lets me work with larger files, but the Virtual Machine on Path-3 is too low on disk space. I've tried to increase the memory and delete any unnecessary files, but I can't get enough free space to open my files. I will also eventually need Excel so I can include all 2221 samples during a BaNJO run. LibreOffice Calc has a 1024 column limit, so there won't be enough room to format the data in either variables as columns/samples as rows or samples as columns/variables as rows.

Once I can use Access or Base, the data should be ready to go through BaNJO!
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 1 guest