Attached are the Datasets used for Naive bayes and logistic regression analysis to predict Glioblastoma. Results are also attached for the reference.
Total of 17813 genes were present in TCGA and more than 20,000 in GEO. matching of TCGA and GEO yielded 6769 genes. Out of which below 19 genes were highly correlated to the disease in 184 patients.
Gene R value P value
ABCB5 -0.30821 <.0001
CD36 -0.26107 0.0002
CIT -0.26494 0.0002
FCRLB -0.26777 0.0002
GOLT1A -0.25901 0.0003
SNRPE -0.28957 <.0001
ADAM7 -0.15648 0.0298
ALPK2 -0.15346 0.0331
FAM26F -0.17889 0.0128
MCOLN3 -0.24179 0.0007
MDM4 -0.19583 0.0063
PAPPA2 -0.17849 0.013
PCDHB18 -0.18433 0.0103
PERP -0.22455 0.0017
PLEKHA5 -0.15757 0.0286
RSPO3 -0.17301 0.0161
SLC5A9 -0.14229 0.0484
TMTC1 -0.17525 0.015
These data were passed through R software for NBC and Logistic Regression.
NBC- Prior Propabilities are attached.
Training error for NBC was: 0.04663212
Logistic Regression-
Training Error: 0.02590674