SMLG (Statistical Machine Learning Group) Discussion Forum

by **efrain.gonzalez0** » Thu Nov 30, 2017 2:00 pm

Here I will include information regarding some of the tools that I used in order to implement my order scoring code.

by **efrain.gonzalez0** » Thu Nov 30, 2017 3:13 pm

Good afternoon,

The lastgamma variable in the below refers to the division of the two last gamma terms in the equation 35 presented by David Heckerman in "A Tutorial on Learning With Bayesian Networks."
You will notice that the lgamma function is being used from the <boost/math/special_functions.hpp>. Since we will often have counts that are large, using the gamma function directly cannot be done. The standard gamma function in the boost library can only accept count of size 171 or less. Therefore the use of the log(gamma) function was required (where here log refers to the natural log or log base e). The log of two gammas being divided by each other is the difference of the log of the gamma in the numerator and the log of the gamma in the denominator. We also know that the log of two numbers being multiplied by each other is the summation of the logs of the two numbers. This is why lastgamma is being added over all categories of the variable i instead of being multiplied over all of the categories of variable i.
The code presented here is a simplification of the code in the actual program:

Code: Select all: for(k=0; k < ri; ++k){ top = (nijkprime + fullNijkvector[posinfull][rightcol]); //Using boost lgamma function for the product over categories and parent combinations lastgamma += boost::math::lgamma(top) - boost::math::lgamma(nijkprime); }

The next for loop presented here actually engulfs the previous and continues to use the lgamma function to complete the second step of Heckerman's formula which involves the second division of two gamma functions. But now we are summing over all combinations of parents (qj) for a particular parent set (Ui):

Code: Select all: for(j < qj_Ui){ double bot = nij + nijprime; seclastgamma += lastgamma + boost::math::lgamma(nijprime) - boost::math::lgamma(bot); }

Since we used logs in our equation in order to convert two of the multiplications to summations we must now cancel these logs with an exponential. Then we must sum the exponential outputs over all parent sets for a particular variable. The summation of these exponential values is done through the use of the logsumexp trick so as to avoid overflow or underflow errors.The following for loop engulfs the previous one:

Code: Select all: for(Ui < |Uialpha|){ //Calculate based on the logsumexp concept //find max value of seclastgamma maxseclastgamma for (size_t what = 0; what < vec2ndlastgamma.size(); ++what) { sumovUialpha += exp(vec2ndlastgamma[what] - maxseclastgamma); } }

Once this all has been calculated I have another for loop that engulfs the previous. This loop calculates the sum of the log of the previous value (sumovUialpha) over all variables. Since we used the logsumexp trick I also had to add the maxseclastgamma value.

Code: Select all: for(i < var){ finlogscore += log(sumovUialpha) + maxseclastgamma; }

SMLG (Statistical Machine Learning Group) Discussion Forum

Boost & Order Scoring

Boost & Order Scoring

Implementation

Who is online