I'm trying to make the prediction script work to check the validation results obtained from GeNIe. Attached are the files I'm using.
is the same structure as the Confounding Structure I've been using to write my thesis but I haven't changed the numbers into useful descriptions in attempts to have the datafile better match up with the structure. I keep getting the message "Segmentation fault (core dumped)" when I run this file on the script.
These are what the values mean:
0= up to age 65 (Age); Non-Hippocampus (Brainregion); Male (Sex); Non-AD (Alzheimer); Low expression (genes)
1= over age 65 (Age); Hippocampus (Brainregion); Female (Sex); AD (Alzheimer); Normal expression (genes)
2= High expression (genes)
Since the script wouldn't run using structures I made from scratch, I tried making the structure using Data>Learn New Network when I had the data file up on GeNIe.
FulldataEr.xdsl
- GeNIe file learned from the training data
- (29.87 KiB) Downloaded 164 times
was made using Data>Learn New Network, deleting the arcs that were initially made by GeNIe then adding the arcs we needed to match the Confounding Structure. The prediction script worked, but the distributions of all the different states of the variables did not match up to the training dataset. Additionally, the prediction values were the same for each sample.
CSLFulldata.xdsl
- GeNIe file learned from the training data then restructured to fit the structure of interest
- (29.79 KiB) Downloaded 160 times
was also made using Data>Learn New Network, but I inputted the structure of interest in the background knowledge. The distributions in this GeNIe file matched, but I kept running into "Segmentation fault (core dumped)" error with this file.
FulldataM.xdsl
- GeNIe file learned from the training data, restructured to fit the structure of interest, parameters relearned until distributions matched data
- (46.18 KiB) Downloaded 153 times
was developed the same way as FulldataEr.xdsl except I kept having to relearn the parameters from the training data to make the distributions on the structure match. The more I had to relearn the parameters from the training data, the better the Log (p) value output. Also in this file, the prediction values were the same for each sample which was the marginal distribution of AD within the training data.