How does BNlearn report the best scoring structure

How does BNlearn report the best scoring structure

Postby cwyoo » Mon Mar 06, 2023 3:52 pm

Searching for the best fitting BN structure given a dataset with a large number of random variables, e.g., > 30, cannot be achieved within a reasonable time. e.g., within 100 years. Typically, searching for the best fitting BN structure is implemented as an any-time-algorithm, i.e., limit the algorithm to run for a specified time. I would ask us to discuss how does BNlearn report the best scoring structure in this topic.
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: How does BNlearn report the best scoring structure

Postby zgong001 » Wed Mar 08, 2023 1:13 pm

The algorithm is as following:

Hill-Climbing Algorithm
1. Choose a network structure G over V, usually (but not necessarily) empty.
2. Compute the score of G, denoted as ScoreG=Score(G).
3. Set maxscore=ScoreG.
4. Repeat the following steps as long as maxscore increases:
1. for every possible arc addition, deletion or reversal not resulting in a cyclic network:
1. compute the score of the modified network G*, ScoreG*=Score(G*):
2. if ScoreG*>ScoreG, set G=G* and ScoreG=ScoreG*.
2. update maxscore with the new value of ScoreG.
5. Return the DAG G.

_____________________________________________
hc in bnlearn:
Score-based structure learning algorithms
Description
Learn the structure of a Bayesian network using a hill-climbing (HC) or a Tabu search (TABU) greedy search.

Usage
hc(x, start = NULL, whitelist = NULL, blacklist = NULL, score = NULL, ...,
debug = FALSE, restart = 0, perturb = 1, max.iter = Inf, maxp = Inf, optimized = TRUE)

Arguments
x
a data frame containing the variables in the model.

start
an object of class bn, the preseeded directed acyclic graph used to initialize the algorithm. If none is specified, an empty one (i.e. without any arc) is used.

whitelist
a data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs to be included in the graph.

blacklist
a data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs not to be included in the graph.

score
a character string, the label of the network score to be used in the algorithm. If none is specified, the default score is the Bayesian Information Criterion for both discrete and continuous data sets. See network scores for details.

...
additional tuning parameters for the network score. See score for details.

debug
a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

restart
an integer, the number of random restarts.

tabu
a positive integer number, the length of the tabu list used in the tabu function.

max.tabu
a positive integer number, the iterations tabu search can perform without improving the best network score.

perturb
an integer, the number of attempts to randomly insert/remove/reverse an arc on every random restart.

max.iter
an integer, the maximum number of iterations.

maxp
the maximum number of parents allowed for a node in any network that is considered in the search, including that that is returned. The default value is Inf.

optimized
a boolean value. If TRUE (the default), score caching is used to speed up structure learning.

Value
An object of class bn. See bn-class for details.
zgong001
 
Posts: 463
Joined: Thu Nov 16, 2017 11:10 am

Re: How does BNlearn report the best scoring structure

Postby cwyoo » Thu Mar 09, 2023 10:09 am

zgong001 wrote:The algorithm is as following:

Hill-Climbing Algorithm
1. Choose a network structure G over V, usually (but not necessarily) empty.
2. Compute the score of G, denoted as ScoreG=Score(G).
3. Set maxscore=ScoreG.
4. Repeat the following steps as long as maxscore increases:
1. for every possible arc addition, deletion or reversal not resulting in a cyclic network:
1. compute the score of the modified network G*, ScoreG*=Score(G*):
2. if ScoreG*>ScoreG, set G=G* and ScoreG=ScoreG*.
2. update maxscore with the new value of ScoreG.
5. Return the DAG G.

_____________________________________________
hc in bnlearn:
Score-based structure learning algorithms
Description
Learn the structure of a Bayesian network using a hill-climbing (HC) or a Tabu search (TABU) greedy search.

Usage
hc(x, start = NULL, whitelist = NULL, blacklist = NULL, score = NULL, ...,
debug = FALSE, restart = 0, perturb = 1, max.iter = Inf, maxp = Inf, optimized = TRUE)

Arguments
x
a data frame containing the variables in the model.

start
an object of class bn, the preseeded directed acyclic graph used to initialize the algorithm. If none is specified, an empty one (i.e. without any arc) is used.

whitelist
a data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs to be included in the graph.

blacklist
a data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs not to be included in the graph.

score
a character string, the label of the network score to be used in the algorithm. If none is specified, the default score is the Bayesian Information Criterion for both discrete and continuous data sets. See network scores for details.

...
additional tuning parameters for the network score. See score for details.

debug
a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

restart
an integer, the number of random restarts.

tabu
a positive integer number, the length of the tabu list used in the tabu function.

max.tabu
a positive integer number, the iterations tabu search can perform without improving the best network score.

perturb
an integer, the number of attempts to randomly insert/remove/reverse an arc on every random restart.

max.iter
an integer, the maximum number of iterations.

maxp
the maximum number of parents allowed for a node in any network that is considered in the search, including that that is returned. The default value is Inf.

optimized
a boolean value. If TRUE (the default), score caching is used to speed up structure learning.

Value
An object of class bn. See bn-class for details.


I suggest using the maximum number of parents (maxp) as five (5). Seems, restart parameter can be used to somewhat make the search run for a longer or a shorter time. Depending on the time it takes for a nonzero value of restart, double the restart value each time you are running follow up experiments.
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: How does BNlearn report the best scoring structure

Postby mlwilcox » Mon Mar 13, 2023 2:14 pm

I was looking more into the 'perturb' argument, which I brought up toward the end of today's SMLG meeting. I think that 'perturb' in the number of random modifications (i.e., the number of relationships in the structure that are randomly modified by adding/removing/reversing arcs) made to the structure initially found as the local maxima. This new, perturbed structure is then used as the initial structure in the next random restart. The default for the 'perturb' argument is 1. See section 2 "Computational complexity of greedy search" in the attached paper.
Attachments
s11222-019-09857-1.pdf
(484.46 KiB) Downloaded 28 times
mlwilcox
 
Posts: 11
Joined: Thu Apr 29, 2021 7:51 pm

Re: How does BNlearn report the best scoring structure

Postby rpazo001 » Mon Mar 27, 2023 6:05 pm

Some points on TABU:

TABU is a modified hill-climbing algorthim able to escape local optima by selecting a network that minimally decreases the score function.

It does not have a restart option like hc.

It does not stop at the first DAG for which every possible edge addition, deletion, or reversal does not improve the score, but further explores the space of DAGs to find a better DAG.

When you set the tabu argument (i.e. tabu = 25), it will perform an additional 25 iterations to make sure no other and potentially better local optimun is discoverd
rpazo001
 
Posts: 58
Joined: Fri May 29, 2020 12:34 pm

Re: How does BNlearn report the best scoring structure

Postby rpazo001 » Mon Mar 27, 2023 6:06 pm

tabu(x, start = NULL, whitelist = NULL, blacklist = NULL, score = NULL, ...,
debug = FALSE, [color=#FFFF00]tabu = 10[/color], max.tabu = tabu, max.iter = Inf, maxp = Inf, optimized = TRUE)
rpazo001
 
Posts: 58
Joined: Fri May 29, 2020 12:34 pm


Return to R package for Bayesian network structure learning (BNlearn)

Who is online

Users browsing this forum: No registered users and 2 guests

cron