by woodbench » Mon Apr 29, 2024 12:03 am
4/28 Update:
Natalie and I have been able to successfully merge the datasets after normalizing with the TMM method, which involved converting the data set that was in raw counts (GSE186332) to a DGEList, converting reads to counts per million (CPM), transposing the matrix to scale it, and transposing it back to long format. For the dataset that was already TMM normalized (GSE154041), the only alteration to the data set was a log base 2 (x + 1) transformation.
We used an inner_join to merge the transformed datasets based on hugo gene symbol, and were able to obtain a tibble with 13,799 genes to analyze. We then removed potential batch effects from the respective studies using the `removeBatchEffects()` function from the limma:: package and discretized expression values from -1 to 1. We ran a quick trial run of the hill climbing procedure using some of the genes heavily cited in literature ("POT1", "HERC2", "BRIP1", "POLE", "EGFR", "SOX9", "PGK1","CA9", "VEGFA", "SPP1", "HIF1A", "HP1BP3", "ZC3H7B", "BRAF","PTEN", "MGMT") to obtain the following result. Of course, we still need to adjust for each treatment and specify our genes of interest using differential expression analysis.
- Attachments
-
- BNLearnHC.pdf
- (4.7 KiB) Downloaded 94 times