GEO datasets

Re: GEO datasets

Postby lsand039 » Tue Mar 07, 2017 12:54 pm

Attached are the genomic data for the sub datasets of superseries GSE44772. They still have to be descritized, but the files attached have the raw expression values connected to the gene name.
Attachments
GSE44768MAc.xlsx
Cerebellum data
(38.05 MiB) Downloaded 135 times
GSE44770MAc.xlsx
Dorsolateral Prefrontal Cortex data
(38.11 MiB) Downloaded 155 times
GSE44771MAc.xlsx
Visual Cortex data
(19 MiB) Downloaded 159 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Mar 07, 2017 1:47 pm

Attached are the genomic data GSE16759 and GSE26927. They still have to be descritized, but the files attached have the raw expression values connected to the gene name.

All the genes for GSE1297, GSE29378, GSE48350, and GSE28146 have been previously matched. GSE29378 has Braak Stage & Plaque score information I can add if we plan to include those variables.

I'm still looking into the 3 datasets that require extra work to match (GSE39420, GSE37263, GSE36980) so I can add their genes to the common genes list.
Attachments
GSE16759MAc.xlsx
(1.55 MiB) Downloaded 144 times
GSE26927MAc.xlsx
(2.48 MiB) Downloaded 158 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Mar 08, 2017 11:49 am

Log (p) (From Genie): -8629.672524
Log Likelihood Bene(calculated on Excel): -389.6868248381
Accuracy: 0.585616 (342/584)
True Positives/ AD: 0.491909  (152/309)
True Negatives/ Non-AD: 0.690909  (190/275)
False Positives: 157/309
False Negatives: 85/275

Log (p) (From Genie): -8617.419198
Log Likelihood Confounding Structure/ Background knowledge: -389.6868248381
Accuracy: 0.585616 (342/584)
True Positives/ AD: 0.491909  (152/309)
True Negatives/ Non-AD: 0.690909  (190/275)
False Positives: 157/309
False Negatives: 85/275

When using the Leave One Out validation tests, Log Likelihood of both Bene and the Confounding structure were exactly the same. I checked the input data and the predicted probabilities of AD/Non AD for the true diagnosis (see CSLOO, Comparing Bene & CS tab), and these probabilities were also exactly the same for both the Bene Structure and the Confounding Structure.

I'm still waiting on a reply from Jinang on why his code won't score these structures.
Attachments
BeneROC-AD.png
BeneROC-AD.png (9.36 KiB) Viewed 49469 times
BeneROC-NonAD.png
BeneROC-NonAD.png (9.5 KiB) Viewed 49473 times
ConfoundingStructureROC-AD.png
ConfoundingStructureROC-AD.png (9.29 KiB) Viewed 49473 times
ConfoundingStructureROC-NonAD.png
ConfoundingStructureROC-NonAD.png (9.46 KiB) Viewed 49473 times
BeneLOO.xlsx
Bene Structure validation Probabilities
(56.19 KiB) Downloaded 156 times
CSLOO.xlsx
Confounding Structure validation probabilities
(145.33 KiB) Downloaded 158 times
Last edited by lsand039 on Fri Mar 10, 2017 10:21 am, edited 3 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Mar 08, 2017 1:50 pm

I checked the structures again after randomizing all my samples and creating a new training/ test data group from these. Again, the results between the Bene structure and the Confounding structure are exactly the same. Here are the results:

Log (p) of Bene Structure: -8924.075942
Log Likelihood Bene (calculated via Excel): -361.121832041
Accuracy: 0.609665 (328/538)
True Positives/ AD: 0.507634 (133/262)
True Negatives/ Non-AD: 0.706522 (195/276)
False Positives: 129/262
False Negatives: 81/276

Log (p) of Confounding Strucutre: -8916.99562
Log Likelihood Bene (calculated via Excel): -361.121832041
Accuracy: 0.609665 (328/538)
True Positives/ AD: 0.507634 (133/262)
True Negatives/ Non-AD: 0.706522 (195/276)
False Positives: 129/262
False Negatives: 81/276
Attachments
CSLOO2.xlsx
Confounding Structure validation probabilities
(134.01 KiB) Downloaded 156 times
BeneLOO2.xlsx
Bene Structure validation Probabilities
(52.1 KiB) Downloaded 156 times
ConfoundingStructureROC-NonAD2.png
ConfoundingStructureROC-NonAD2.png (9.43 KiB) Viewed 49473 times
ConfoundingStructureROC-AD2.png
ConfoundingStructureROC-AD2.png (9.29 KiB) Viewed 49473 times
BeneROC-NonAD2.png
BeneROC-NonAD2.png (9.43 KiB) Viewed 49473 times
BeneROC-AD2.png
BeneROC-AD2.png (9.32 KiB) Viewed 49473 times
CSTraining2.png
Results of Parameter learning: Confounding Structure
CSTraining2.png (52.09 KiB) Viewed 49473 times
BeneTraining2.png
Results of Parameter learning: Bene Structure
BeneTraining2.png (50.04 KiB) Viewed 49473 times
Data2.xlsx
Randomization of Total/ Training/ Test samples
(176.16 KiB) Downloaded 149 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Mar 15, 2017 9:11 am

I was playing around with more simple structures to see if their BDe scores, log likelihood scores, leave one out accuracy tests, or ROC areas under the curve were better than the confounding structure/ background knowledge structure we found. Here are 5 structures I've looked at. I've normalized their log likelihood scores to see how they compare. Structures 1 & 3 make up nearly 100% of the scores.

Structure1:
Age, Sex, and Brain Region kept independent; order from Bene retained as a serial connection for the genes
BDe (from Genie): -9029.977970
Log Likelihood : -367.724669671385
Log Likelihood %: 79.3032254155456 %
Accuracy: 0.666096 (389/584)
True Positive: 0.877023 (271/309)
True Negative: 0.429091 (118/275)
False Positive: 0.122977 (38/309)
False Negative: 0570909 (157/275)
ROC Non AD: 0.620953
ROC AD: 0.620953

Structure2:
same as Structure1, but PSEN1 & 2 are flipped.
BDe (from Genie): -9066.151640
Log Likelihood : -375.896489627824
Log Likelihood %: 0.0224034311606427 %
Accuracy: 0.655822 (383/584)
True Positive: 0.906149 (280/309)
True Negative: 0.374545 (103/275)
False Positive: 0.093851 (29/309)
False Negative: 0.625455 (172/309)
ROC Non AD: 0.598694
ROC AD: 0.598835

Structure3:
Age, Sex, and Brain Region kept independent; literature review information incorporated
BDe (from Genie): -9056.664191
Log Likelihood : -369.070008981655
Log Likelihood %: 20.6546296650185 %
Accuracy: 0.667808 (390/584)
True Positive: 0.886731 (274/309)
True Negative: 0.421818 (116/275)
False Positive: 0.113269 (35/309)
False Negative: 0.578182 (159/275)
ROC Non AD: 0.619647
ROC AD: 0.6206

Structure4:
Age, Sex, and Brain Region kept independent; order from Bene retained and literature review information incorporated
BDe (from Genie): -9003.542740
Log Likelihood : -376.022983654362
Log Likelihood %: 0.0197414423242391 %
Accuracy: 0.650685 (380/584)
True Positive: 0.812298 (251/309)
True Negative: 0.469091 (129/275)
False Positive: 0.187702 (58/309)
False Negative: 0.530909 (146/275)
ROC Non AD: 0.647849
ROC AD: 0.647849

Structure5:
Age, Sex, and Brain Region kept independent; order from Bene retained as a serial connection for the genes
BDe (from Genie): -8989.949771
Log Likelihood : -389.686824838119
Log Likelihood %: 2.29746049108818e-08 %
Accuracy: 0.585616 (342/584)
True Positive: 0.491909 (152/309)
True Negative: 0.690909 (190/275)
False Positive: 0.508091 (157/390)
False Negative: 0.309091 (85/275)
ROC Non AD: 0.628997
ROC AD: 0.628997
Attachments
LogLikelihoods of Simple Structures.xlsx
(345.08 KiB) Downloaded 137 times
Structure5.png
Structure5.png (6.3 KiB) Viewed 49466 times
Structure4.png
Structure4.png (10.6 KiB) Viewed 49466 times
Structure3.png
Structure3.png (5.77 KiB) Viewed 49466 times
Structure2.png
Structure2.png (5.74 KiB) Viewed 49466 times
Structure1.png
Structure1.png (5.53 KiB) Viewed 49466 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Mar 16, 2017 12:03 pm

Here are the Bene Structure and its equivalent Confounding/ Background Knowledge structure, a simpler structure that has the best Log Likelihood (as calculated on Excel), and the structure with the best ROC curve. The prediction values come from the Leave One Out accuracy tests. Also attached are the training and test dataset files I used.

Bene Structure & Confounding Structure equivalent
Prediction value: 0.609665 (328/538)
Log Likelihood: -361.121832041
ROC Non-AD: 0.628997
ROC AD: 0.628997
Genie score: -8924.075942 (Bene Structure) (equlivalent Confounding Structure)

Best Log Likelihood: (Structure 1 on previous post)
Prediction value: 0.666096 (389/584)
Log Likelihood: -367.724669671385
ROC Non-AD: 0.620953
ROC AD: 0.620953
Genie score: -9029.977970

Best ROC: (Structure 4 on previous post)
Prediction value: 0.650685 (380/584)
Log Likelihood : -376.022983654362
ROC Non AD: 0.647849
ROC AD: 0.647849
Genie score: -9003.542740
Attachments
BestLogLikelihoodROCNonAD.png
BestLogLikelihoodROCNonAD.png (8.85 KiB) Viewed 49466 times
BestLogLikelihooodROCAD.png
BestLogLikelihooodROCAD.png (8.82 KiB) Viewed 49466 times
BestROCNonAD.png
BestROCNonAD.png (9.08 KiB) Viewed 49466 times
BeneROC-AD.png
BeneROC-AD.png (9.36 KiB) Viewed 49466 times
Structure4.png
Best ROC
Structure4.png (10.6 KiB) Viewed 49466 times
Structure1.png
Best Log Likelihood
Structure1.png (5.53 KiB) Viewed 49466 times
Full data.txt
Training Dataset
(27.44 KiB) Downloaded 158 times
Test.txt
Test Dataset
(9.75 KiB) Downloaded 136 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri Mar 17, 2017 8:04 am

Here are the Genie files for the structures mentioned in the previous post.
Attachments
BestROC.xdsl
(5.41 KiB) Downloaded 147 times
BestLogLikelihood.xdsl
(4.54 KiB) Downloaded 134 times
Hippocampus full data.xdsl
Bene Structure
(23.86 KiB) Downloaded 145 times
ConfoundingStructure.xdsl
Equivalent Structure
(29.33 KiB) Downloaded 154 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Mar 20, 2017 12:50 pm

Here are the gene expression combinations that maximize and minimize AD risk according to the Confounding Structure. The % AD/ Non AD risk were the highest values I could find playing around with setting Age, Brain Region, and Sex as evidence.

There were multiple combinations with the same AD/ Non AD risks for certain groups. For the AD group: Age up to 65 and Males Age over 65 in the hippocampus had more than one combination of genes that maximized AD risk. For Non AD group: Females up to Age 65 in the Hippocampus had multiple combinations minimizing AD risk.

I could not find a gene expression combination for AD Males up to Age 65 within the Hippocampus that was higher than 50%. Our dataset only had one sample that was an AD Males up to Age 65 within the Hippocampus and this sample was in the test dataset. This sample had Normal APOE and APP and High PSEN1 and PSEN2. This pattern was consistent with maximizing AD risk for Males up to Age 65.
Attachments
ADcombinationsCS.png
ADcombinationsCS.png (32.45 KiB) Viewed 49462 times
NonADcombinationsCS.png
NonADcombinationsCS.png (60.99 KiB) Viewed 49462 times
ADFemaleCombinations.png
ADFemaleCombinations.png (13.71 KiB) Viewed 49462 times
NonADFemaleCombinations.png
NonADFemaleCombinations.png (24.06 KiB) Viewed 49462 times
ADMalesCombinations.png
ADMalesCombinations.png (17.48 KiB) Viewed 49462 times
NonADMalesCombinations.png
NonADMalesCombinations.png (21.99 KiB) Viewed 49462 times
ADHippocampusCombinations.png
ADHippocampusCombinations.png (16.01 KiB) Viewed 49462 times
ADOver65Combinations.png
ADOver65Combinations.png (12.27 KiB) Viewed 49462 times
NonADHippocampusCombinations.png
NonADHippocampusCombinations.png (35.86 KiB) Viewed 49462 times
NonADOver65combinations.png
NonADOver65combinations.png (21.68 KiB) Viewed 49462 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue Mar 21, 2017 11:43 am

The remaining tables for the previous post
Attachments
ADUpTo65Combinations.png
ADUpTo65Combinations.png (14.3 KiB) Viewed 49462 times
ADUpTo65Combinations.png
ADUpTo65Combinations.png (14.3 KiB) Viewed 49462 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Mar 22, 2017 4:07 pm

I compared the prediction values from the Leave One Out accuracy test outputs with the values obtained from manually inputting the evidence, and the values generally don't match up. Attached are the validation result output files for both the Bene and Confounding Structure and an Excel file for the 10 randomly selected samples I chose to compare.

I really don't know why the prediction values are different. I've even used the training data as the samples I chose to compare prediction values for. Please help.
Attachments
TestingTrainingConfoundingStructure.txt
(63.16 KiB) Downloaded 153 times
TestingTrainingBene.txt
(63.16 KiB) Downloaded 145 times
Checking Predictions .xlsx
(6.45 KiB) Downloaded 151 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 0 guests

cron