SMLG (Statistical Machine Learning Group) Discussion Forum

by **samanthagonzales** » Wed Dec 13, 2023 2:25 pm

A workflow for performing Bayesian network analysis using the bnlearn package in R using gene expression data as an example.

by **mlwilcox** » Tue Mar 24, 2026 12:42 pm

External Timer to Set Maximum Runtime for Hill-Climbing Search
The hill-climbing search implemented in bnlearn does not allow the user to set a maximum runtime. Instead, the duration of the runtime depends on the specified number of random restarts and maximum number of iterations.

The algorithm will start with an initial structure (e.g., an empty structure, a randomly generated structure, or a structure specified by the user), then make changes to the structure in attempt to improve the structure score. Each attempt to improve the structure is an iteration. This initial part of the search will terminate when either (1) no further improvement is made to the structure, or (2) when the maximum number of iterations is reached. The default for the maximum number of iterations is max.iter = Inf, but this can be changed by the user. The structure learned at this point is considered a local maximum.

To escape from this local maximum in search for a better structure, the algorithm can perform a number of random restarts. One or more features from the current structure will be randomly changed (i.e., a randomly selected edge will be reversed or removed, or an edge will be added). The number of random feature changes is referred to as the number of perturbations. The default number of perturbations is perturb = 1, but this can be changed by the user. After the random feature change(s) are made, attempts are made to improve the structure, as described in the previous paragraph. The best structure found from this restart will be compared to the best structure found in the previous search round, and the best of those two will be selected. If the number of random restarts is >1, the process will continue until the specified number of random restarts is completed. The default number of random restarts is restart = 0, but this can be changed by the user.

We may want to run a hill-climbing search for a specified duration of time (e.g., 1, 2, 4, 8, 16, 32 hrs). An external timer is required to do this. Attached is an R script that I wrote that utilizes an external timer. The hc() function is called with restart = 0 and max.iter = 1000. The initial structure is a randomly generated structure that is generated using the malancon method and that sets the maximum number of parents and children to 10 each. After this initial search terminates, the random_change_arc() function is called to randomly change one feature. The hc() function is then called again, using this new structure as the starting structure. This process repeats within a 'while' loop until the runtime set by the user (object 'runtime_hr') is reached. Note that the while loop will repeat if the runtime is not reached at the start of a new loop. A loop will not terminate if the runtime is reached in the middle of the loop. Thus, the final runtime may exceed the runtime set by the user. The score of the learned structure is saved in the dataframe "runtime_results_run1" and the model string of the learned structure is saved as "bestdag_run1". Depending on the number of variables in your dataset and the length of the variable names, R may not be able to print the full model string. Thus, the R environment is also saved to allow you to reference the learned structure for further analysis (see last line of the code).

Several aspects of this code can be altered to fit your analysis. For example, this code is written for discrete data and thus converts all variables to factors and uses BDe scoring. For the hill-climbing search: the starting structure, the maximum number of parents, the maximum number of restarts, and the scoring method can all be changed. If you want to perform more than 1 perturbation with every random restart, the function random_change_arc() will need to be modified. You will also need to update the line of code that reads in your data (object 'dat') and update the file name for the saved R environment (see last line of code).

Please let me know if you have any questions or any suggestions for the code.

SMLG (Statistical Machine Learning Group) Discussion Forum

Bayesian Network Analysis using R package bnlearn

Bayesian Network Analysis using R package bnlearn

Re: Bayesian Network Analysis using R package bnlearn

Who is online