GEO datasets

Re: GEO datasets

Postby lsand039 » Wed Jun 22, 2016 9:33 am

Here are the graphs with influence scores for 100 variables. Some of the arcs are very thin since their influence scores are close to 0. Let me know if I should just change those arcs into plain black, like other nodes that have an influence score of 0.
Attachments
100.4.2MB2.dot.png
100 variables MB2
100.4.2MB2.dot.png (355.73 KiB) Viewed 76535 times
100.4.2.dot.png
Original graph
100.4.2.dot.png (1000.03 KiB) Viewed 76535 times
100.4.2MB1.dot.png
100 variables MB1
100.4.2MB1.dot.png (106.5 KiB) Viewed 76535 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Jun 22, 2016 11:27 am

Here are the graphs with influence scores for 250 variables.
Attachments
250.8.4.dot.png
250 variables original graph
250.8.4.dot.png (3.32 MiB) Viewed 76535 times
250.8.4MB1.dot.png
250 variables MB1
250.8.4MB1.dot.png (81.01 KiB) Viewed 76535 times
250.8.4MB2.dot.png
250 variables MB2
250.8.4MB2.dot.png (286.59 KiB) Viewed 76535 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Jul 19, 2016 2:05 pm

Here are the graphs, influence scores, and likelihood percentages for the 4 well-known AD genes APOE, APP, PSEN1 & PSEN2.
Attachments
literatureknowledgeB.dot.png
literatureknowledgeB.dot.png (26.88 KiB) Viewed 76532 times
literatureknowledge.dot.png
literatureknowledge.dot.png (27.91 KiB) Viewed 76532 times
dataB.dot.png
dataB.dot.png (28.77 KiB) Viewed 76532 times
data.dot.png
data.dot.png (26.32 KiB) Viewed 76532 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Aug 01, 2016 7:07 pm

Here's a dataset/ series search update of AD samples. The 4 previously used are included in this list and are highlighted in green. Yellow highlighted rows indicate new, potentially useful datasets. Red text indicate cell line samples.

Sheet 2: 28 are from the USA & have both gender and age. There are 22 new datasets/ series I need to look into and check to see if we can use them for future tests.

Sheet 3: 44 datasets from the USA and other countries that include age & gender.

Sheet 4: 42 studies that are from the USA but may not include age & gender.
Attachments
Dataset Updates.xlsx
(39.24 KiB) Downloaded 163 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri Sep 09, 2016 10:26 am

So I discovered that the descritized data for a couple datasets didn't copy correctly when I combined all 4 studies. I redid the correlations, and the order of the top correlated genes were completely different. (For example, age went from the top correlated variable to the 969th.)
I reran the tests for all the 11273 genes along with the top 20, 50, 100, 250, 500, 1000, 2500, 5000 variables in 1, 2, 4, and 8 hour Banjo runs in all 3 terminals. Sex and age were only included if they were in the appropriate variable interval. The inputs folder under the Banjo Runs file contains the files I used for all 3 terminals. The outputs were organized by terminal and number of variables. I combined the scores and percentages and organized them in Scores & Percentages.xlsx. In Markov Blanket genes.xlsx, I found out the first and second degree Markov Blanket genes in the top scoring graphs of each variable interval. The graphs of the top score for each interval are labeled [number of variables].[length of time of Banjo run].[terminal of Banjo run].

I'm still running some more banjo runs for figuring out the best graph that inludes APP, APOE, PSEN1, PSEN2, age, and sex, but I've included the tests I've done so far playing around with the arcs. The Bene file for those genes is named l.png and was included with the Banjo output files.
Attachments
Revised.tar.gz
(61.81 MiB) Downloaded 165 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Jan 18, 2017 12:41 pm

Here new datasets that I've cleaned up using Access and Excel. I've only included data for the demographic data & genes we're currently looking at (APOE, APP, PSEN1, PSEN2).
Attachments
GSE37263MA.xlsx
(21.99 KiB) Downloaded 157 times
GSE39420MA.xlsx
(23.54 KiB) Downloaded 155 times
GSE36980MA.xlsx
(41.28 KiB) Downloaded 149 times
GSE26927MA.xlsx
(25.6 KiB) Downloaded 165 times
GSE5281MA.xlsx
(79.48 KiB) Downloaded 166 times
15222D.xlsx
GSE15222
(142.86 KiB) Downloaded 159 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Jan 18, 2017 1:13 pm

Attached are all the samples I have so far. Sheet 1 & 2 are the same; the data is just transposed.

Total Samples:1681
852 AD, 829 control)
731 Females, 950 Males
Ages: 20-106

Brain Regions Include:
Cerebellum
cortical tissue
Entorhinal Cortex
frontal cortex
frontal temporal cortex
Hippocampus
Medial Temporal Gyrus
Middle temporal gyrus
neocortex
parietal cortex
Parietal lobe
Post-Central Gyrus
Posterior Cingulate
posterior cingulate at thalamus level
Posterior cingulate cortex
Prefrontal Cortex
Primary Visual Cortex
Superior Frontal Gyrus
temporal cortex
Visual Cortex
Unknown cortex regions

Datasets included:
GSE1297
GSE28146
GSE29378
GSE48350
GSE16759
GSE44768
GSE44770
GSE44771
GSE26927
GSE5281
GSE15222
GSE36980
GSE39420
GSE37263

I'm still hoping to add GSE84422 which has 2004 samples once I resolve the memory & possible/ probable AD issues. GSE37264 and GSE26972 may be added in the future; currently, their GPL file is too large to open on Excel.
Attachments
Full Data.xlsx
v1
(161.61 KiB) Downloaded 164 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Jan 24, 2017 5:27 pm

Here are the data for GSE84422. I could only include platforms GPL570 and GPL96.
Attachments
84422aD.xlsx
from GPL96
(452.6 KiB) Downloaded 149 times
GSE84422cMA.xlsx
from GPL570
(52.74 KiB) Downloaded 152 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Jan 25, 2017 12:16 pm

Attached is the full dataset that I plan to use and descriptions of the original GEO datasets. I've randomized the samples using random.org. Please let me know how large I should make the training group so I can start analysis on Bene & Genie.
Attachments
Dataset Summary.xlsx
(23.13 KiB) Downloaded 175 times
Full Data.xlsx
v2
(358.79 KiB) Downloaded 159 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Jan 30, 2017 1:16 pm

Here are the datasets that have more than just the age and biological sex of the samples:

Post Mortem Interval: 1430 samples
GSE84422
GSE44771
GSE44770
GSE44768
GSE37263
GSE29378
GSE26927
GSE16759
GSE1297

Race/ethnicity: 1128 samples
GSE5281
GSE84422
GSE15222

BraakStage: 698 samples
GSE84422
GSE29378
GSE1297

NeurofibrillaryTangles: 635 samples
GSE1297
GSE84422

LOAD specified: 228 samples
GSE15222
GSE44768
GSE44770
GSE44771

Some other individual datasets that had information interesting to note:
GSE39420: samples are specified as FAD and PSEN1 mutation types
GSE29378: samples' plaque score and disease duration are listed
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 437 guests

cron