SMLG (Statistical Machine Learning Group) Discussion Forum

Posted: **Fri Nov 26, 2021 12:54 pm**

Data analysis is complete. I have extracted approximate inferance and exaxt inferance of the models. I chose to use the model that had the best score. Will work on paper this week.

Posted: **Tue Nov 30, 2021 7:02 am**

Wilcox_2021.11.29.pptx: (155.09 KiB) Downloaded 50 times

Attached is an updated PowerPoint showing the progress on my class project. After my last discussion with Dr. Yoo on the SMGL meeting, I decided to scale back my project and perform a differential abundance analysis to identity operational taxonomic units (OTUs) that are associated with IBD diagnosis.

I completed the following steps:

1. Normalized counts within each sample to account for potential unequal quantities of starting RNA. I used the total-sum scaling normalization method, which essentially transforms the abundance count table into a relative abundance table.
2. Calculated log2 fold-changes for Crohn's Disease (CD) relative to control, and for ulcerative colitis (UC) relative to control.
3. Selected the 20 OTU's with the largest ABS(Log2 FC) for CD and UC, resulting in ~40 OTUs.
4. Discretized the relative abundance of each selected OTU based on its mean value.
5. Used bnlearn in R to learn the Bayesian Network. The model inputs were IBD diagnosis (CD, UC, vs nonIBD) and the relative abundances for ~40 identified OTUs. I used the hill-climbing algorithm based on the loglikelihood score and allowing for a maximum of 3 parents per node.

The structure of the learned BN is attached. The nodes that are a part of the Marcov blanket for IBD diagnosis are bold.

I have also started drafting the background and methods sections of my paper.

Posted: **Thu Dec 02, 2021 11:05 am**

I decided to focus on a gene the is intergral in the mTOR pathway and just use one BM. I focused speficially on it and it's relationship with a zinc finger protein in zinc deficient rats. The paper is broken into two parts. The first part is the Limma analysis which identifies the DEGs that are used in the BN analysis.Part 2 is the BN analysis , but it only focuses on a small part of the DEGs.

Posted: **Sun Dec 05, 2021 10:47 pm**

I've made some modifications to my analysis since my last post and the last SMGL meeting. I decided to go back and normalize the raw counts using the DESeq method, which has been shown to be less biased than the total-sum scaling method. After re-normalizing the counts, I recomputed the log2 fold changes for Crohn's Disease vs. controls and for ulcerative colitis vs. control to select the OTUs to include in my models. I generated 2 structures: (1) Naive Bayes Classifier and (2) a structure learned from a hill-climbing with random restarts algorithm. I compared the goodness of fit and prediction capabilities of the 2 structures (the latter using 3-fold cross validation). Based on these assessments, I selected the structure learned from a hill-climbing with random restarts algorithm and used Bayesian parameter estimation to estimate the parameters.

A draft of my paper is attached. I'm still working on revising the introduction to reflect the change in my research question, and working on the Discussion section. Also attached is the data used in the analysis (normalized counts) and my R code.

SMLG (Statistical Machine Learning Group) Discussion Forum

Probabilistic Graphical Models, Fall 2021

Re: Probabilistic Graphical Models, Fall 2021

Re: Probabilistic Graphical Models, Fall 2021

Re: Probabilistic Graphical Models, Fall 2021

Re: Probabilistic Graphical Models, Fall 2021