Attached is an updated PowerPoint showing the progress on my class project. After my last discussion with Dr. Yoo on the SMGL meeting, I decided to scale back my project and perform a differential abundance analysis to identity operational taxonomic units (OTUs) that are associated with IBD diagnosis.
I completed the following steps:
1. Normalized counts within each sample to account for potential unequal quantities of starting RNA. I used the total-sum scaling normalization method, which essentially transforms the abundance count table into a relative abundance table.
2. Calculated log2 fold-changes for Crohn's Disease (CD) relative to control, and for ulcerative colitis (UC) relative to control.
3. Selected the 20 OTU's with the largest ABS(Log2 FC) for CD and UC, resulting in ~40 OTUs.
4. Discretized the relative abundance of each selected OTU based on its mean value.
5. Used bnlearn in R to learn the Bayesian Network. The model inputs were IBD diagnosis (CD, UC, vs nonIBD) and the relative abundances for ~40 identified OTUs. I used the hill-climbing algorithm based on the loglikelihood score and allowing for a maximum of 3 parents per node.
The structure of the learned BN is attached. The nodes that are a part of the Marcov blanket for IBD diagnosis are bold.
I have also started drafting the background and methods sections of my paper.