SMLG (Statistical Machine Learning Group) Discussion Forum

by **lsand039** » Thu Mar 30, 2017 3:43 pm

Below is an updated list of GEO datasets I found that could be potentially useful. The highlighting within the file is not quite up to date, but it will give you an idea which one's I've looked into already.

by **lsand039** » Mon Apr 03, 2017 9:52 am

Attached are all the 8286 genes that match among the 12 datasets contained gene name information. GSE 28146 (GDS4136), GSE48350, GSE29378, and GSE 1297 (GDS810) are not included in this post because they have already been previously posted.

GSE15222 was not part of the compressed file.

GEO datasets included in this file:
GSE44768
GSE44770
GSE44771
GSE5281
GSE16759
GSE26927
GSE84422 (GPL96 & 570)

by **lsand039** » Mon Apr 03, 2017 12:13 pm

Below are the files for calculations for the z-scores and discretized values of the following datasets:
GSE15222
GSE44768
GSE44770
GSE44771
GSE5281
GSE16759
GSE26927
GSE84422 (GPL96 & 570)

There are 8285 unique genes common to these datasets.

by **lsand039** » Wed Apr 05, 2017 4:26 pm

I'm trying to make the prediction script work to check the validation results obtained from GeNIe. Attached are the files I'm using.

ConfoundingStructure0.xdsl: GeNIe file made from scratch; (28.9 KiB) Downloaded 180 times

is the same structure as the Confounding Structure I've been using to write my thesis but I haven't changed the numbers into useful descriptions in attempts to have the datafile better match up with the structure. I keep getting the message "Segmentation fault (core dumped)" when I run this file on the script.
These are what the values mean:
0= up to age 65 (Age); Non-Hippocampus (Brainregion); Male (Sex); Non-AD (Alzheimer); Low expression (genes)
1= over age 65 (Age); Hippocampus (Brainregion); Female (Sex); AD (Alzheimer); Normal expression (genes)
2= High expression (genes)

Since the script wouldn't run using structures I made from scratch, I tried making the structure using Data>Learn New Network when I had the data file up on GeNIe.

FulldataEr.xdsl: GeNIe file learned from the training data; (29.87 KiB) Downloaded 191 times

was made using Data>Learn New Network, deleting the arcs that were initially made by GeNIe then adding the arcs we needed to match the Confounding Structure. The prediction script worked, but the distributions of all the different states of the variables did not match up to the training dataset. Additionally, the prediction values were the same for each sample.

CSLFulldata.xdsl: GeNIe file learned from the training data then restructured to fit the structure of interest; (29.79 KiB) Downloaded 186 times

was also made using Data>Learn New Network, but I inputted the structure of interest in the background knowledge. The distributions in this GeNIe file matched, but I kept running into "Segmentation fault (core dumped)" error with this file.

FulldataM.xdsl: GeNIe file learned from the training data, restructured to fit the structure of interest, parameters relearned until distributions matched data; (46.18 KiB) Downloaded 174 times

was developed the same way as FulldataEr.xdsl except I kept having to relearn the parameters from the training data to make the distributions on the structure match. The more I had to relearn the parameters from the training data, the better the Log (p) value output. Also in this file, the prediction values were the same for each sample which was the marginal distribution of AD within the training data.

by **lsand039** » Wed Apr 05, 2017 4:44 pm

I'm currently trying another method to get the structures of interest and its correct parameters into GeNIe using BaNJO.

So far, I've run the training data and specified the Confounding Structure in the setting file and ran BaNJO for a minute to give me a score and Dot File. I need to convert this Dot File into an .xdsl (GeNIe) file. From there I will use the prediction script to see if the prediction values for each case are different.

Below are the files I used to get the Confounding Structure and Bene Structures.

by **lsand039** » Wed Apr 05, 2017 5:33 pm

So using BaNJO and the dot-to-xdsl program won't work to get the correct structure and parameters needed. It seems that when the Dot structure is converted to the .xdsl (GeNIe) file, the variables can only hold 2 states.

by **lsand039** » Wed Apr 12, 2017 9:19 am

I've been able to the the parameters/ distributions of the structure to match with the data file, but I keep running into the message "Segmentation fault (core dumped)" when I use these structures under the prediction script. Below are a description of how I made each:

CSLPFulldataC001: I took the file ConfoundingStrutureNoParameters which I drew and manually specified uniform distributions for all the variable states then chose to Learn Parameters from the data file using confidence of 0.001. The distributions of the variable states matched the datafile, and GeNIe gave this structure a Log (p)= 8517.694735.

FulldataLP001SS1753: I learned a new network from the data file, providing background structure so the output would be the structure I wanted. I changed the link probability to 0.001 and sample size to 1753 (all the samples in our datafile). Here the output structure didn't have distributions of variables that matched our data for PSEN1 and PSEN2.

FulldataLP1SS1753NBDRALPC001: I learned a new network from the data file but provided no background structure. I kept the link probability to its default (0.1) but changed the sample size to 1753. Since the resulting structure was not the structure I wanted, I redrew the arcs and chose to Learn Parameters from the data file using confidence of 0.001 and checking off "Uniformize". The distributions of the variable states matched the datafile, and GeNIe also gave this structure a Log (p)= 8517.694735.

by **lsand039** » Wed Apr 12, 2017 9:41 am

FulldataLP1LOOSS1753NBDRALPC001: I did the same steps as FulldataLP1SS1753NBDRALPC001, but this time I specified that accuracy should be used as a score, and I choose the Leave One Out option. Again I run into the same message "Segmentation fault (core dumped)".

by **lsand039** » Wed Apr 12, 2017 10:10 am

Here is the link where I got the prediction script: http://smlg.fiu.edu/gitlab/meninonas/prediction.git
Also attached is the configuration file. I've just been changing the name for the .xdsl file for the script to use and keeping the same datafile.
Note: I haven't found a difference in getting the script to work by specifying State1/State0 or 1/0 for the Outcome/NotOutcomeState.

by **lsand039** » Mon Apr 17, 2017 3:04 pm

I'm currently checking patterns in my results as we're trying to get the predict script to work. Below I've compared the gene expression patterns in the Counfounding Structure (CS) and Bene structure (GMLS) that have the highest and lowest AD risk. The combinations for the highest/ lowest AD risk are the same, but the % differ from the CS and GMLS as shown in red.

SMLG (Statistical Machine Learning Group) Discussion Forum

GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Who is online