GEO datasets

Re: GEO datasets

Postby lsand039 » Thu Mar 30, 2017 3:43 pm

Below is an updated list of GEO datasets I found that could be potentially useful. The highlighting within the file is not quite up to date, but it will give you an idea which one's I've looked into already.
Attachments
Dataset Updates.xlsx
(92.18 KiB) Downloaded 158 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Apr 03, 2017 9:52 am

Attached are all the 8286 genes that match among the 12 datasets contained gene name information. GSE 28146 (GDS4136), GSE48350, GSE29378, and GSE 1297 (GDS810) are not included in this post because they have already been previously posted.

GSE15222 was not part of the compressed file.

GEO datasets included in this file:
GSE44768
GSE44770
GSE44771
GSE5281
GSE16759
GSE26927
GSE84422 (GPL96 & 570)
Attachments
GSE15222MAc consolidation test.xlsx
(66.89 MiB) Downloaded 163 times
MatchedGenes.tar.gz
(607.75 MiB) Downloaded 159 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Apr 03, 2017 12:13 pm

Below are the files for calculations for the z-scores and discretized values of the following datasets:
GSE15222
GSE44768
GSE44770
GSE44771
GSE5281
GSE16759
GSE26927
GSE84422 (GPL96 & 570)

There are 8285 unique genes common to these datasets.
Attachments
descritized.tar.gz
(276.13 MiB) Downloaded 149 times
zscores.tar.gz
(607.88 MiB) Downloaded 149 times
Last edited by lsand039 on Mon Apr 10, 2017 9:12 am, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 05, 2017 4:26 pm

I'm trying to make the prediction script work to check the validation results obtained from GeNIe. Attached are the files I'm using.

ConfoundingStructure0.xdsl
GeNIe file made from scratch
(28.9 KiB) Downloaded 154 times
is the same structure as the Confounding Structure I've been using to write my thesis but I haven't changed the numbers into useful descriptions in attempts to have the datafile better match up with the structure. I keep getting the message "Segmentation fault (core dumped)" when I run this file on the script.
These are what the values mean:
0= up to age 65 (Age); Non-Hippocampus (Brainregion); Male (Sex); Non-AD (Alzheimer); Low expression (genes)
1= over age 65 (Age); Hippocampus (Brainregion); Female (Sex); AD (Alzheimer); Normal expression (genes)
2= High expression (genes)


Since the script wouldn't run using structures I made from scratch, I tried making the structure using Data>Learn New Network when I had the data file up on GeNIe.

FulldataEr.xdsl
GeNIe file learned from the training data
(29.87 KiB) Downloaded 164 times
was made using Data>Learn New Network, deleting the arcs that were initially made by GeNIe then adding the arcs we needed to match the Confounding Structure. The prediction script worked, but the distributions of all the different states of the variables did not match up to the training dataset. Additionally, the prediction values were the same for each sample.

CSLFulldata.xdsl
GeNIe file learned from the training data then restructured to fit the structure of interest
(29.79 KiB) Downloaded 160 times
was also made using Data>Learn New Network, but I inputted the structure of interest in the background knowledge. The distributions in this GeNIe file matched, but I kept running into "Segmentation fault (core dumped)" error with this file.

FulldataM.xdsl
GeNIe file learned from the training data, restructured to fit the structure of interest, parameters relearned until distributions matched data
(46.18 KiB) Downloaded 153 times
was developed the same way as FulldataEr.xdsl except I kept having to relearn the parameters from the training data to make the distributions on the structure match. The more I had to relearn the parameters from the training data, the better the Log (p) value output. Also in this file, the prediction values were the same for each sample which was the marginal distribution of AD within the training data.
Attachments
Fulldata.csv
Data file containing only the training samples
(27.44 KiB) Downloaded 164 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 05, 2017 4:44 pm

I'm currently trying another method to get the structures of interest and its correct parameters into GeNIe using BaNJO.

So far, I've run the training data and specified the Confounding Structure in the setting file and ran BaNJO for a minute to give me a score and Dot File. I need to convert this Dot File into an .xdsl (GeNIe) file. From there I will use the prediction script to see if the prediction values for each case are different.

Below are the files I used to get the Confounding Structure and Bene Structures.
Attachments
CS.graph.2017.03.29.08.30.42.txt
Confounding Structure for Dot
(637 Bytes) Downloaded 156 times
Bene.graph.2017.03.29.08.17.46.txt
Bene structure for Dot
(659 Bytes) Downloaded 163 times
CSsetting.txt
Settings File to produce Confounding Structure
(5.8 KiB) Downloaded 154 times
CS.txt
Must Have arcs file for Confounding Structure
(111 Bytes) Downloaded 145 times
CSmn.txt
Must have not arcs file for Confounding Structure
(27 Bytes) Downloaded 154 times
benesetting.txt
Settings file to produce Bene Structure
(5.81 KiB) Downloaded 176 times
training0.txt
Must have arcs file for Bene Structure
(148 Bytes) Downloaded 163 times
trainingdata.txt
(27.44 KiB) Downloaded 154 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 05, 2017 5:33 pm

So using BaNJO and the dot-to-xdsl program won't work to get the correct structure and parameters needed. It seems that when the Dot structure is converted to the .xdsl (GeNIe) file, the variables can only hold 2 states.
Attachments
CSconverted.xdsl
(11.39 KiB) Downloaded 163 times
BeneConverted.xdsl
(12.44 KiB) Downloaded 161 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 12, 2017 9:19 am

I've been able to the the parameters/ distributions of the structure to match with the data file, but I keep running into the message "Segmentation fault (core dumped)" when I use these structures under the prediction script. Below are a description of how I made each:

CSLPFulldataC001: I took the file ConfoundingStrutureNoParameters which I drew and manually specified uniform distributions for all the variable states then chose to Learn Parameters from the data file using confidence of 0.001. The distributions of the variable states matched the datafile, and GeNIe gave this structure a Log (p)= 8517.694735.

FulldataLP001SS1753: I learned a new network from the data file, providing background structure so the output would be the structure I wanted. I changed the link probability to 0.001 and sample size to 1753 (all the samples in our datafile). Here the output structure didn't have distributions of variables that matched our data for PSEN1 and PSEN2.

FulldataLP1SS1753NBDRALPC001: I learned a new network from the data file but provided no background structure. I kept the link probability to its default (0.1) but changed the sample size to 1753. Since the resulting structure was not the structure I wanted, I redrew the arcs and chose to Learn Parameters from the data file using confidence of 0.001 and checking off "Uniformize". The distributions of the variable states matched the datafile, and GeNIe also gave this structure a Log (p)= 8517.694735.
Attachments
FulldataLP1SS1753NBDRALPC001.xdsl
(34.87 KiB) Downloaded 147 times
FulldataLP001SS1753.xdsl
(29.74 KiB) Downloaded 143 times
CSLPFulldataC001.xdsl
(34.47 KiB) Downloaded 156 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 12, 2017 9:41 am

FulldataLP1LOOSS1753NBDRALPC001: I did the same steps as FulldataLP1SS1753NBDRALPC001, but this time I specified that accuracy should be used as a score, and I choose the Leave One Out option. Again I run into the same message "Segmentation fault (core dumped)".
Attachments
FulldataLP1LOOSS1753NBDRALPC001.xdsl
(34.89 KiB) Downloaded 157 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Apr 12, 2017 10:10 am

Here is the link where I got the prediction script: http://smlg.fiu.edu/gitlab/meninonas/prediction.git
Also attached is the configuration file. I've just been changing the name for the .xdsl file for the script to use and keeping the same datafile.
Note: I haven't found a difference in getting the script to work by specifying State1/State0 or 1/0 for the Outcome/NotOutcomeState.
Attachments
predict.config.txt
(247 Bytes) Downloaded 138 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Apr 17, 2017 3:04 pm

I'm currently checking patterns in my results as we're trying to get the predict script to work. Below I've compared the gene expression patterns in the Counfounding Structure (CS) and Bene structure (GMLS) that have the highest and lowest AD risk. The combinations for the highest/ lowest AD risk are the same, but the % differ from the CS and GMLS as shown in red.
Attachments
MinADriskAge>65.png
MinADriskAge>65.png (24.83 KiB) Viewed 43264 times
MaxADriskAge>65.png
MaxADriskAge>65.png (28.93 KiB) Viewed 43264 times
MinADriskHippocampus.png
MinADriskHippocampus.png (37.5 KiB) Viewed 43264 times
MaxADriskHippocampus.png
MaxADriskHippocampus.png (39.46 KiB) Viewed 43264 times
MinADriskMales.png
MinADriskMales.png (23.97 KiB) Viewed 43264 times
MaxADriskMales.png
MaxADriskMales.png (24.68 KiB) Viewed 43264 times
MinADriskFemales.png
MinADriskFemales.png (25.4 KiB) Viewed 43264 times
MaxADriskFemales.png
MaxADriskFemales.png (26.45 KiB) Viewed 43264 times
MinADrisk.png
MinADrisk.png (63.35 KiB) Viewed 43264 times
MaxADrisk.png
MaxADrisk.png (65.62 KiB) Viewed 43264 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 0 guests

cron