Here I will be looking into the code and explaining how some of the code works. All of this is done with the help of a simple data set that I have attached below.
The order that we will be looking at in this example is <A,B,C>.
PART 1:Lets begin with the step in the code labeled as follows:
- Code: Select all
//Total Families Ui,alpha for a particular variable in the order
The great thing about this part of the code is that so long as the amount of variables in your ordering remains the same the results will be the same. Therefore, I would say that it is independent of the order itself and only dependent on the number of variables in your order and the maximum amount of parents allowed for each variable. The number of entries will always be equivalent to the amount of variables in your order. In our example we only have 3 variables so the vector labeled
families will only have three entries. The value of the
families vector will be <1,2,4> assuming that we allowed for the maximum amount of parents. The maximum amount of parents in this case is 2 because C which is the last variable in the order can have A and B as parents. If we had set the maximum amount of parents to 1 then the
families vector would be <1,2,3>. The code goes one variable at a time so lets work out the code by hand for our example:
- Code: Select all
First variable is A:
A cannot have any parents so the amount of parents is the same as 0 choose 0 which is 1.
Second variable is B:
B can be without parents and it can have A as a parent so the total amount of parent sets for B is 2 which is the summation of 1 choose 0 and 1 choose 1
Third variable is C:
C can be without parents, it can have A as a parent, it can have B as a parent, and it can have A and B as parents. Therefore the total amount of parent sets for C is 4 which is the summation of 2 choose 0, 2 choose 1, and 2 choose 2.
Since the maximum amount of parents affects the total number of parent sets we sometimes see that the total amount of parent sets for any given variable is not the maximum amount of parent sets.
PART 2:Now lets look at the part of the code that begins with the comment
- Code: Select all
//How many parent combinations for each step? As well as there counts
This part of the code finds the parent combinations (qi) for each parent set (Ui,alpha) and the counts (Nijk) for a particular parent combination. We look at the parent combinations first. The number of parent combinations for each parent set is based entirely on the amount of parents and the amount of categories for each parent. In our example the three variables are binary. The information for these parent combinations is stored in the vector of vectors names
ParentCombos. The amount of vectors in this vector of vectors is equivalent to the amount of variables in your order. In this manor the vector in the first row represents the parent combinations for the first variable in the order and the vector in the last row represents the parent combinations for the last variable in the order. The length of each vector in this vector of vectors is dependent on the number of parent sets associated with that variable in the
families vector. So for our example with maximum number of parents the
ParentCombos will look like this:
- Code: Select all
1
1 2
1 2 2 4
With only 1 parent as the maximum then the
ParentCombos vector of vectors will look like this:
- Code: Select all
1
1 2
1 2 2
So what is happening? Recall that each of the variables is binary and that the amount of parent sets for each variable can be found in the vector named
families. Again we work through the code by hand for our example:
- Code: Select all
First Variable A:
A cannot have any parents and so it cannot have any parent combinations and so the value in [u]ParentCombos[/u] must be 1.
Second variable is B:
B has 2 different parent sets in one B has no parents and in the other B's parent is A. The first case is like the above case for variable A and so [u]ParentCombos[/u] first entry must be 1. In the second case since A has two categories the second entry for [u]ParentCombos[/u] must be 2.
Third variable is C:
C has 4 different parent sets in one it has no parents, in the other A is its parent, in another B is its parent, and in the last one A and B are its parents. Therefore the total amount of entries into [u]ParentCombos[/u] is 4 for this row. The first entry is again 1 for the same reason as before. The next two entries are 2 because A and B are binary. The last entry is 4 because since A and B are binary the total combinations of A and B is 4.
The second part of this step is calculating the counts for the parent combinations. The full set of counts is stored in the vector of vectors named
fullNijkvector. The process for iterating through all of the combinations is complex and requires its own post. Here instead I will focus on helping you understand what each vector in
fullNijkvector represents. For our example with maximum parents set to 2 the following is
fullNijkvector:
- Code: Select all
12 4
7 9
8 1 4 3
12 4
3 1 9 3
3 1 4 8
2 1 1 0 2 7 2 1
The first row represents the counts when A = 1 and when A = 0, respectively. The second row represents the counts when B = 1 and B= 0, respectively. The third row represents the counts when (A,B) = (1,0), (A,B) =(0,0), (A,B) =(1,1), and (A,B) = (0,1), respectively. The fourth row represents the counts when C = 1 and C = 0 respectively. The fifth row represents the counts when (A,C) = (1,0), (A,C) = (0,0), (A,C) = (1,1), and (A,C) = (0,1), respectively. The sixth row represents the counts when (B,C) = (1,0), (B,C) = (0,0), (B,C) = (1,1), and (B,C) = (0,1), respectively. The seventh row represents the counts when (A,B,C) = (1,1,0), (A,B,C) = (1,0,0), (A,B,C) = (0,1,0), (A,B,C) = (0,0,0), (A,B,C) = (1,1,1), (A,B,C) = (1,0,1), (A,B,C) = (0,1,1), and (A,B,C) = (0,0,1), respectively.