SMLG (Statistical Machine Learning Group) Discussion Forum

by **cwyoo** » Tue Jul 03, 2018 3:11 pm

I want us to use this place to post pseudo or actual codes and discussions of them for the bare bone of Order Scoring with Sampling (OSS).

Efrain, can you post a simple C++ code that (1) reads in a dataset and a user configuration file (a starting order is specified in here); (2) implements method of returning BDeu score given the following parameters: (variable index, array of parents indices); (3) loop through all variables in the dataset and for each variable, return BDeu score of that variable given all the variables in the higher order as parents using method developed in (2).

Please attach relevant files (e.g., configuration file, dataset, etc.) and information how to compile and run.

by **efrain.gonzalez0** » Fri Jul 13, 2018 1:33 pm

Good afternoon,

Below I have attached the current version of the code which calculates the score of an order based on the parent combinations set by the user. I have also attached the configuration file for this code as well as a test data set that I used and a file with the parent combinations that I used.

If you look at the file with the parent combinations and you look at the configuration file you will see that the order that I tested was "0 1 2 3 4 5 6 7 8" with two parents. In the parent combinations file you will see many values that are separated with tabs and a few that are separated by commas. The tabs are used to separate different parent combinations for the same variable. The commas are used to separate different variables within a single parent combination. For example if 0, 1, and 2 are part of a single combination then you would represent the combination as follows: 0,1,2. If however 0, 1, and 2 represent three separate parent combinations then you would represent the combinations as follows: 0(tab)1(tab)2(tab). You will also notice that the first value in each row is the value of the variable that the row represents. This lets the program know that the variable may have no parents. The values set for each variable are based on the order that they have within the data set and as this information does not change then the values of each variable in the data set will not change. I sometimes refer to these as the global values for each variable.

Within the code file there are many functions but the one that is important for our discussion here is the one titled OSS. Although the function is set so that at some point it may keep track of other information (the information used in the caching swap function), the only information that is useful that is currently output by the function is the score of the order. Currently the code only takes a single order and calculates the score but if we use something similar to what I did for the other codes on the GitLab I believe that we can adjust the code for true MCMC simulations.

To run this code the only library we need is boost. Boost can be installed using sudo apt-get install libboost-all-dev
If you are using the terminal on one of the servers you can compile the code by using g++ -x c++ -std=c++11 -o OSSCode Downloads/OSS_start.cpp
To run the code after compiling it in the above manor you will need to execute the following on the terminal ./OSSCode

SMLG (Statistical Machine Learning Group) Discussion Forum

Implementation of Order Scoring with Sampling

Implementation of Order Scoring with Sampling

OSS Code Beginnings

Who is online