Page 2 of 3

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Wed Mar 13, 2019 12:27 am
by cpere117
These our my analysis results using limma for several microarray studies specified in the excel sheet name. A few normalization plots of my data distribution for ID1-ID4 have also been inputted.

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Tue Apr 16, 2019 4:41 pm
by cpere117
Here is a continued update of my research project as discussed earlier in our meeting professor. I've also attached an excel table with the completed datasets highlighted and ready for discretization for BANJO.

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Tue Apr 16, 2019 10:49 pm
by cpere117
Further updates with dataset discretization, z scores, and cleaned datasets

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Mon Apr 22, 2019 11:56 am
by cpere117
Attached are the completed discretized and cleaned microarray datasets from the attached master table of Alzheimer's Disease datasets from GEO. I attempted to upload all 13 datasets in a zipped folder onto SMLG but was having problems due to the sheer memory size of the file uploading to the forum. Later today, I plan to post all the matched genes across the datasets, and if possible post a merged dataset of all my sample data along with the discretized file that I plan to run in BANJO this week. Thank you!

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Mon Apr 22, 2019 7:04 pm
by cpere117
After discretizing my data and matching genes across 9 datasets (GSE 1102981, GSE1297, GSE28146, GSE29378, GSE44772, GSE45596, GSE5281, GSE84422-GPL570, GSE8422-GPL96) the final tally of common genes was 6432 genes in total. Note, due to quality control purposes 3 datasets were not included in further analysis (GSE36980, GSEGSE37263, GSE39420). A total of 1633 samples (950 AD, 683 control) were merged and discretized according to Z-score values. Gender, age, and condition will all be categorized in binary variables for runs in BANJO. Furthermore, of the 6432 total genes found across the datasets, 846 genes are known to be validated targets of ID3 from the Chip/RNA integrative data gathered from our lab (Mayur conducted the experiment). My plan is to finalize the cleaning process early tomorrow and then attempt to run BANJO across three trials of 1 hour, 3hours, and 9 hours to gather a Bayesian network analysis.

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Wed Apr 24, 2019 7:44 pm
by cpere117
Today I ran my first BANJO trial run on the following dataset: Control Females with ID3 Target genes,
Stack trace info:

edu.duke.cs.banjo.data.observations.ObservationsAsArray.loadData(ObservationsAsArray.java:404)
edu.duke.cs.banjo.utility.FileUtil.loadObservations(FileUtil.java:548)
edu.duke.cs.banjo.data.settings.Settings.loadObservations(Settings.java:3187)
edu.duke.cs.banjo.application.Banjo.execute(Banjo.java:129)
edu.duke.cs.banjo.application.Banjo.main(Banjo.java:447)

-----------------------------------------------------------------------------
End of error notification
-----------------------------------------------------------------------------

christianperez@path-four:~/Banjo.2.2.0$ java -jar banjo.jar settingsFile=data/release2.0/static/input/CM5Settings_2
-----------------------------------------------------------------------------
- Banjo Bayesian Network Inference with Java Objects -
- Release 2.2.0 15 Apr 2008 -
- Licensed from Duke University -
- Copyright (c) 2005-08 by Alexander J. Hartemink -
- All rights reserved -
-----------------------------------------------------------------------------
- Project: ID3 CP 2019
- User: christianperez
- Dataset: 850-vars-444-observations
- Notes: static bayesian network inference
-----------------------------------------------------------------------------
- Settings file:
-----------------------------------------------------------------------------
- Input directory: data/release2.0/static/input
- Observations file: Control_Females
- Observation count: 444
- Number of variables: 850
- Variable names: inFile
- Discretization policy: none
- Exceptions to the discretization policy: none
-----------------------------------------------------------------------------
- Initial structure file:
- 'Must be present' edges file:
- 'Must not be present' edges file:
- Min. Markov lag: 0
- Max. Markov lag: 0
- Max. parent count: 10
- Equivalent sample size for Dirichlet parameter prior: 1.0
-----------------------------------------------------------------------------
- Searcher: SearcherSimAnneal
- Proposer: ProposerRandomLocalMove
- Evaluator: defaulted to EvaluatorBDe
- Decider: defaulted to DeciderMetropolis
-----------------------------------------------------------------------------
- Pre-compute logGamma: no
- Cache: fastLevel1
- Cycle checking method: Depth-first Search
-----------------------------------------------------------------------------
- Initial temperature: 10000
- Cooling factor: 0.7
- Reannealing temperature: 1000
- Max. accepted networks before cooling: 5000
- Max. proposed networks before cooling: 20000
- Min. accepted networks before reannealing: 500
-----------------------------------------------------------------------------
- Output directory: data/release2.0/static/output
- Report file: static.report.CPFEMCONTROL.2019.04.24.19.41.56.txt
- Number of best networks tracked: 1
- Max. time: 3.0 h
- Max. restarts: 100000
- Min. networks before checking: 5000
- Screen reporting interval: 10.0 m
- File reporting interval: 10.0 m
-----------------------------------------------------------------------------
- Compute influence scores: yes
- Compute consensus graph: yes
- Create consensus graph as HTML: yes
- Create 'dot' output: yes
- Location of 'dot': /usr/bin/dot
-----------------------------------------------------------------------------
- XML output directory: data/release2.0/static/output
- XML Report file:
- XML settings to export:
- XML parser: org.apache.xerces.parsers.SAXParser
- Banjo XML format version: 1.0
-----------------------------------------------------------------------------
- Seed for starting search: 1556149316392
-----------------------------------------------------------------------------
- Number of threads: 1
-----------------------------------------------------------------------------

Memory info before starting the search: Banjo is using 38 mb of memory
Prep. time used: 153.0 ms
Beginning to search: expect a status report every 10.0 m


In tomorrow's meeting I will report results....

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Thu Apr 25, 2019 1:20 pm
by cpere117
Here's the example output for the three hour run on ID3 targets (including ID3 in the input) solely focusing upon all samples that were both female and controls. I'm now initiating the same three hour run for female AD patients to compare the top network graph produced. Finally, tomorrow I will post the results of my control males and AD males after three hour trial runs./home/christianperez/Banjo.2.2.0/data/release2.0/static/output/top.graphCPFEMCONTROL2019.04.24.19.41.56.svg

Best Network Score: -245040.5347, first found at iteration 33089595850

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Thu Apr 25, 2019 5:08 pm
by cpere117
Male 3 hour results for control samples only.
Status: Networks 83665000
Time 3.0 h (100.0% of max. 3.0 h)
Re-anneals 218 (0.2% of max. 100000)
Banjo is using 43 mb of memory

-----------------------------------------------------------------------------
- Intermediate report Best network so far
-----------------------------------------------------------------------------

Network score: -153339.4023, first found at iteration 14514745
850

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Thu Apr 25, 2019 8:28 pm
by cpere117
/home/christianperez/Desktop/top.graphCPFemaleAD2019.Attached is the top graph for AD females developed by BANJO, this involved 237 observations and 850 variables. This is a three hour run, and following this run the final BANJO structure will be posted for males with AD and the top graph produced from the matched genes across 9 microarray datasets and 1,116 samples that were also ID3 target genes. Expect my final report tomorrow.file:///mnt/disk/home/christianperez/Banjo.2.2.0/data/release2.0/static/output/top.graphCPFemaleAD2019.04.25.17.03.30.svg

Re: Probabilistic Graphical Models, Fall 2018

PostPosted: Mon Apr 29, 2019 5:39 pm
by cpere117
Here is an updated consensus graph across six hours in BANJO of my Female AD patients, I've noticed less of a muddled output as compared to my three hour output from my Female AD patients. This seems logical as consensus network over a greater amount of time should elaborate a more concise directed acyclic graph (DAG). I' going to run 1 hour outputs across my AD FEMALE, AD MALE, CONTROL FEMALE, AND CONTROL MALES to compare with the aforementioned 3 hour outputs across my four datasets.