Here is an update on what I’ve done so far:
I began by preprocessing the GSE222494 frontal cortex single-nucleus dataset. Because the BN algorithm required subject-level observations rather than individual cells, I generated pseudobulk profiles by summing gene counts across all cells belonging to each subject. This produced a 24 × 33,525 subject-level expression matrix. I verified that all selected transcription factors (NRF1, NFE2L1, NFE2L2, and GABPA) were present in the pseudobulk dataset. Next, I normalized the expression values using a log₂(x+1) transform and then discretized the transcription factor expressions into three equal-frequency bins (Low/Medium/High). This step was necessary because I used discrete Bayesian Network structure learning, which requires categorical variables. For structure learning, I used the Hill-Climb Search (HC) algorithm implemented in pgmpy along with the BDeu score (equivalent sample size = 5). I first ran the structure learning using the four transcription factors to evaluate how network complexity influences the inferred edges. I generated and saved the BN graph outputs.
The BN identified a consistent regulatory pattern where NFE2L2 → NRF1 → NFE2L1, with an additional edge NFE2L2 → GABPA. I also fitted maximum likelihood CPDs for each TF. These CPDs showed strong directional probability patterns. For example, NFE2L2(High) predicts NRF1(High) with probability 0.875 suggesting meaningful regulatory dependencies.
As part of the evaluation, I am now interpreting the network using biological literature to determine whether the edges reflect known mitochondrial and oxidative stress pathways (NFE2L2 as an upstream antioxidant regulator and NRF1/NFE2L1 in mitochondrial biogenesis).

- BN_TFs_only.png (50.14 KiB) Viewed 54 times
Next steps:
I plan to test model stability by altering the discretization scheme (2 bins vs. 3 bins) and by adjusting the BDeu equivalent sample size to see whether the learned edges remain consistent. I also plan to extend the BN to include downstream mitochondrial genes (such as TFAM, COX4I1, and ATP5F1B) to build a more complete regulatory network. If needed, I will generate additional CPDs and visualizations to compare how parameter changes affect the final structure.