SMLG (Statistical Machine Learning Group) Discussion Forum

by **lsand039** » Wed Jun 22, 2016 9:33 am

Here are the graphs with influence scores for 100 variables. Some of the arcs are very thin since their influence scores are close to 0. Let me know if I should just change those arcs into plain black, like other nodes that have an influence score of 0.

by **lsand039** » Wed Jun 22, 2016 11:27 am

Here are the graphs with influence scores for 250 variables.

by **lsand039** » Tue Jul 19, 2016 2:05 pm

Here are the graphs, influence scores, and likelihood percentages for the 4 well-known AD genes APOE, APP, PSEN1 & PSEN2.

by **lsand039** » Mon Aug 01, 2016 7:07 pm

Here's a dataset/ series search update of AD samples. The 4 previously used are included in this list and are highlighted in green. Yellow highlighted rows indicate new, potentially useful datasets. Red text indicate cell line samples.

Sheet 2: 28 are from the USA & have both gender and age. There are 22 new datasets/ series I need to look into and check to see if we can use them for future tests.

Sheet 3: 44 datasets from the USA and other countries that include age & gender.

Sheet 4: 42 studies that are from the USA but may not include age & gender.

by **lsand039** » Fri Sep 09, 2016 10:26 am

So I discovered that the descritized data for a couple datasets didn't copy correctly when I combined all 4 studies. I redid the correlations, and the order of the top correlated genes were completely different. (For example, age went from the top correlated variable to the 969th.)
I reran the tests for all the 11273 genes along with the top 20, 50, 100, 250, 500, 1000, 2500, 5000 variables in 1, 2, 4, and 8 hour Banjo runs in all 3 terminals. Sex and age were only included if they were in the appropriate variable interval. The inputs folder under the Banjo Runs file contains the files I used for all 3 terminals. The outputs were organized by terminal and number of variables. I combined the scores and percentages and organized them in Scores & Percentages.xlsx. In Markov Blanket genes.xlsx, I found out the first and second degree Markov Blanket genes in the top scoring graphs of each variable interval. The graphs of the top score for each interval are labeled [number of variables].[length of time of Banjo run].[terminal of Banjo run].

I'm still running some more banjo runs for figuring out the best graph that inludes APP, APOE, PSEN1, PSEN2, age, and sex, but I've included the tests I've done so far playing around with the arcs. The Bene file for those genes is named l.png and was included with the Banjo output files.

by **lsand039** » Wed Jan 18, 2017 12:41 pm

Here new datasets that I've cleaned up using Access and Excel. I've only included data for the demographic data & genes we're currently looking at (APOE, APP, PSEN1, PSEN2).

by **lsand039** » Wed Jan 18, 2017 1:13 pm

Attached are all the samples I have so far. Sheet 1 & 2 are the same; the data is just transposed.

Total Samples:1681
852 AD, 829 control)
731 Females, 950 Males
Ages: 20-106

Brain Regions Include:
Cerebellum
cortical tissue
Entorhinal Cortex
frontal cortex
frontal temporal cortex
Hippocampus
Medial Temporal Gyrus
Middle temporal gyrus
neocortex
parietal cortex
Parietal lobe
Post-Central Gyrus
Posterior Cingulate
posterior cingulate at thalamus level
Posterior cingulate cortex
Prefrontal Cortex
Primary Visual Cortex
Superior Frontal Gyrus
temporal cortex
Visual Cortex
Unknown cortex regions

Datasets included:
GSE1297
GSE28146
GSE29378
GSE48350
GSE16759
GSE44768
GSE44770
GSE44771
GSE26927
GSE5281
GSE15222
GSE36980
GSE39420
GSE37263

I'm still hoping to add GSE84422 which has 2004 samples once I resolve the memory & possible/ probable AD issues. GSE37264 and GSE26972 may be added in the future; currently, their GPL file is too large to open on Excel.

by **lsand039** » Tue Jan 24, 2017 5:27 pm

Here are the data for GSE84422. I could only include platforms GPL570 and GPL96.

by **lsand039** » Wed Jan 25, 2017 12:16 pm

Attached is the full dataset that I plan to use and descriptions of the original GEO datasets. I've randomized the samples using random.org. Please let me know how large I should make the training group so I can start analysis on Bene & Genie.

by **lsand039** » Mon Jan 30, 2017 1:16 pm

Here are the datasets that have more than just the age and biological sex of the samples:

Post Mortem Interval: 1430 samples
GSE84422
GSE44771
GSE44770
GSE44768
GSE37263
GSE29378
GSE26927
GSE16759
GSE1297

Race/ethnicity: 1128 samples
GSE5281
GSE84422
GSE15222

BraakStage: 698 samples
GSE84422
GSE29378
GSE1297

NeurofibrillaryTangles: 635 samples
GSE1297
GSE84422

LOAD specified: 228 samples
GSE15222
GSE44768
GSE44770
GSE44771

Some other individual datasets that had information interesting to note:
GSE39420: samples are specified as FAD and PSEN1 mutation types
GSE29378: samples' plaque score and disease duration are listed

SMLG (Statistical Machine Learning Group) Discussion Forum

GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Who is online