Probabilistic Graphical Models, Fall 2018

Class Projects from courses such as Probabilistic Graphical Network, Biostatistics II, etc.

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Wed Mar 13, 2019 12:27 am

These our my analysis results using limma for several microarray studies specified in the excel sheet name. A few normalization plots of my data distribution for ID1-ID4 have also been inputted.
Attachments
gse29378.png
gse29378.png (49.24 KiB) Viewed 2062 times
GSE37263 DATA.png
GSE37263 DATA.png (23.78 KiB) Viewed 2062 times
ID4_GSE44768.png
ID4_GSE44768.png (20.66 KiB) Viewed 2062 times
gse39420 Cross comparison.png
gse39420 Cross comparison.png (30.92 KiB) Viewed 2062 times
gse28146 data normalization.png
gse28146 data normalization.png (38.09 KiB) Viewed 2062 times
ALZHEIMER GEO DATA ANALYSIS.xlsx
(10.54 MiB) Downloaded 127 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Tue Apr 16, 2019 4:41 pm

Here is a continued update of my research project as discussed earlier in our meeting professor. I've also attached an excel table with the completed datasets highlighted and ready for discretization for BANJO.
Attachments
GEO-Alzheimer-datasets Master Table.xlsx
(17.66 KiB) Downloaded 113 times
GSE5281aftexcel.txt
(117.44 MiB) Downloaded 119 times
GSE5281zscore.txt
(90.39 MiB) Downloaded 105 times
GSE5281dscrt.txt
(19.91 MiB) Downloaded 118 times
GSE110298aftexcel.txt
(27.09 MiB) Downloaded 119 times
GSE110298zscore.txt
(19.15 MiB) Downloaded 118 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Tue Apr 16, 2019 10:49 pm

Further updates with dataset discretization, z scores, and cleaned datasets
Attachments
GSE36980aftexcel.txt
(25.82 MiB) Downloaded 112 times
GSE36980zscore.txt
(32.78 MiB) Downloaded 128 times
GSE36980dscrt.txt
(7.55 MiB) Downloaded 108 times
GSE167595702aftexcel.txt
(6.4 MiB) Downloaded 117 times
GSE167595702zscore.txt
(4.75 MiB) Downloaded 117 times
GSE167595702dscrt.txt
(1.37 MiB) Downloaded 121 times
GSE281462aftexcel.txt
(13.64 MiB) Downloaded 108 times
GSE281462zscore.txt
(16.98 MiB) Downloaded 101 times
GSE281462dscrt.txt
(4.05 MiB) Downloaded 105 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Mon Apr 22, 2019 11:56 am

Attached are the completed discretized and cleaned microarray datasets from the attached master table of Alzheimer's Disease datasets from GEO. I attempted to upload all 13 datasets in a zipped folder onto SMLG but was having problems due to the sheer memory size of the file uploading to the forum. Later today, I plan to post all the matched genes across the datasets, and if possible post a merged dataset of all my sample data along with the discretized file that I plan to run in BANJO this week. Thank you!
Attachments
GEO-Alzheimer-datasets Master Table (3).xlsx
(133.04 KiB) Downloaded 119 times
GSE44772dscrt.txt
(100.36 MiB) Downloaded 103 times
GSE1102981zscore.txt
(19.18 MiB) Downloaded 111 times
GSE1102981aftexcel.txt
(27.14 MiB) Downloaded 110 times
GSE1102981dscrt.txt
(4.53 MiB) Downloaded 111 times
GSE8442296dscrt.txt
(52.69 MiB) Downloaded 115 times
GSE8442296zscore.txt
(236.09 MiB) Downloaded 115 times
GSE84422570zscore.txt
(56.5 MiB) Downloaded 120 times
GSE84422570aftexcel.txt
(69.9 MiB) Downloaded 110 times
GSE84422570dscrt.txt
(12.82 MiB) Downloaded 116 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Mon Apr 22, 2019 7:04 pm

After discretizing my data and matching genes across 9 datasets (GSE 1102981, GSE1297, GSE28146, GSE29378, GSE44772, GSE45596, GSE5281, GSE84422-GPL570, GSE8422-GPL96) the final tally of common genes was 6432 genes in total. Note, due to quality control purposes 3 datasets were not included in further analysis (GSE36980, GSEGSE37263, GSE39420). A total of 1633 samples (950 AD, 683 control) were merged and discretized according to Z-score values. Gender, age, and condition will all be categorized in binary variables for runs in BANJO. Furthermore, of the 6432 total genes found across the datasets, 846 genes are known to be validated targets of ID3 from the Chip/RNA integrative data gathered from our lab (Mayur conducted the experiment). My plan is to finalize the cleaning process early tomorrow and then attempt to run BANJO across three trials of 1 hour, 3hours, and 9 hours to gather a Bayesian network analysis.
Attachments
Book4.xlsx
Discretized and merged file of all samples
(33.2 MiB) Downloaded 113 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Wed Apr 24, 2019 7:44 pm

Today I ran my first BANJO trial run on the following dataset: Control Females with ID3 Target genes,
Stack trace info:

edu.duke.cs.banjo.data.observations.ObservationsAsArray.loadData(ObservationsAsArray.java:404)
edu.duke.cs.banjo.utility.FileUtil.loadObservations(FileUtil.java:548)
edu.duke.cs.banjo.data.settings.Settings.loadObservations(Settings.java:3187)
edu.duke.cs.banjo.application.Banjo.execute(Banjo.java:129)
edu.duke.cs.banjo.application.Banjo.main(Banjo.java:447)

-----------------------------------------------------------------------------
End of error notification
-----------------------------------------------------------------------------

christianperez@path-four:~/Banjo.2.2.0$ java -jar banjo.jar settingsFile=data/release2.0/static/input/CM5Settings_2
-----------------------------------------------------------------------------
- Banjo Bayesian Network Inference with Java Objects -
- Release 2.2.0 15 Apr 2008 -
- Licensed from Duke University -
- Copyright (c) 2005-08 by Alexander J. Hartemink -
- All rights reserved -
-----------------------------------------------------------------------------
- Project: ID3 CP 2019
- User: christianperez
- Dataset: 850-vars-444-observations
- Notes: static bayesian network inference
-----------------------------------------------------------------------------
- Settings file:
-----------------------------------------------------------------------------
- Input directory: data/release2.0/static/input
- Observations file: Control_Females
- Observation count: 444
- Number of variables: 850
- Variable names: inFile
- Discretization policy: none
- Exceptions to the discretization policy: none
-----------------------------------------------------------------------------
- Initial structure file:
- 'Must be present' edges file:
- 'Must not be present' edges file:
- Min. Markov lag: 0
- Max. Markov lag: 0
- Max. parent count: 10
- Equivalent sample size for Dirichlet parameter prior: 1.0
-----------------------------------------------------------------------------
- Searcher: SearcherSimAnneal
- Proposer: ProposerRandomLocalMove
- Evaluator: defaulted to EvaluatorBDe
- Decider: defaulted to DeciderMetropolis
-----------------------------------------------------------------------------
- Pre-compute logGamma: no
- Cache: fastLevel1
- Cycle checking method: Depth-first Search
-----------------------------------------------------------------------------
- Initial temperature: 10000
- Cooling factor: 0.7
- Reannealing temperature: 1000
- Max. accepted networks before cooling: 5000
- Max. proposed networks before cooling: 20000
- Min. accepted networks before reannealing: 500
-----------------------------------------------------------------------------
- Output directory: data/release2.0/static/output
- Report file: static.report.CPFEMCONTROL.2019.04.24.19.41.56.txt
- Number of best networks tracked: 1
- Max. time: 3.0 h
- Max. restarts: 100000
- Min. networks before checking: 5000
- Screen reporting interval: 10.0 m
- File reporting interval: 10.0 m
-----------------------------------------------------------------------------
- Compute influence scores: yes
- Compute consensus graph: yes
- Create consensus graph as HTML: yes
- Create 'dot' output: yes
- Location of 'dot': /usr/bin/dot
-----------------------------------------------------------------------------
- XML output directory: data/release2.0/static/output
- XML Report file:
- XML settings to export:
- XML parser: org.apache.xerces.parsers.SAXParser
- Banjo XML format version: 1.0
-----------------------------------------------------------------------------
- Seed for starting search: 1556149316392
-----------------------------------------------------------------------------
- Number of threads: 1
-----------------------------------------------------------------------------

Memory info before starting the search: Banjo is using 38 mb of memory
Prep. time used: 153.0 ms
Beginning to search: expect a status report every 10.0 m


In tomorrow's meeting I will report results....
Attachments
Control_Females.txt
(742.69 KiB) Downloaded 102 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Thu Apr 25, 2019 1:20 pm

Here's the example output for the three hour run on ID3 targets (including ID3 in the input) solely focusing upon all samples that were both female and controls. I'm now initiating the same three hour run for female AD patients to compare the top network graph produced. Finally, tomorrow I will post the results of my control males and AD males after three hour trial runs./home/christianperez/Banjo.2.2.0/data/release2.0/static/output/top.graphCPFEMCONTROL2019.04.24.19.41.56.svg

Best Network Score: -245040.5347, first found at iteration 33089595850
Attachments
top.graphCPFEMCONTROL2019.04.24.19.41.56.png
top.graphCPFEMCONTROL2019.04.24.19.41.56.png (6.16 MiB) Viewed 2041 times
top.graphCPFEMCONTROL2019.04.24.19.41.56.txt
(33.66 KiB) Downloaded 120 times
static.report.CPFEMCONTROL.2019.04.24.19.41.56.txt
(269.54 KiB) Downloaded 112 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Thu Apr 25, 2019 5:08 pm

Male 3 hour results for control samples only.
Status: Networks 83665000
Time 3.0 h (100.0% of max. 3.0 h)
Re-anneals 218 (0.2% of max. 100000)
Banjo is using 43 mb of memory

-----------------------------------------------------------------------------
- Intermediate report Best network so far
-----------------------------------------------------------------------------

Network score: -153339.4023, first found at iteration 14514745
850
Attachments
top.graphCPMALEAD2019.04.25.20.10.00.pdf
(575.56 KiB) Downloaded 111 times
static.report.CPMALECONTROL.2019.04.25.13.40.39.txt
(248.72 KiB) Downloaded 118 times
Control_Males.txt
(400.5 KiB) Downloaded 102 times
Last edited by cpere117 on Tue Apr 30, 2019 3:19 am, edited 2 times in total.
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Thu Apr 25, 2019 8:28 pm

/home/christianperez/Desktop/top.graphCPFemaleAD2019.Attached is the top graph for AD females developed by BANJO, this involved 237 observations and 850 variables. This is a three hour run, and following this run the final BANJO structure will be posted for males with AD and the top graph produced from the matched genes across 9 microarray datasets and 1,116 samples that were also ID3 target genes. Expect my final report tomorrow.file:///mnt/disk/home/christianperez/Banjo.2.2.0/data/release2.0/static/output/top.graphCPFemaleAD2019.04.25.17.03.30.svg
Attachments
top.graphCPFemaleAD2019.04.25.17.03.30.txt
(31.96 KiB) Downloaded 116 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Probabilistic Graphical Models, Fall 2018

Postby cpere117 » Mon Apr 29, 2019 5:39 pm

Here is an updated consensus graph across six hours in BANJO of my Female AD patients, I've noticed less of a muddled output as compared to my three hour output from my Female AD patients. This seems logical as consensus network over a greater amount of time should elaborate a more concise directed acyclic graph (DAG). I' going to run 1 hour outputs across my AD FEMALE, AD MALE, CONTROL FEMALE, AND CONTROL MALES to compare with the aforementioned 3 hour outputs across my four datasets.
Attachments
consensus.graphCPFemaleAD_6.2019.04.26.11.19.25.jpeg
consensus.graphCPFemaleAD_6.2019.04.26.11.19.25.jpeg (3.05 MiB) Viewed 2036 times
static.report.FEMALEAD_6.2019.04.26.11.19.25.txt
(1.76 MiB) Downloaded 78 times
top.graphCPFemaleAD_62019.04.26.11.19.25.txt
(32.04 KiB) Downloaded 50 times
consensus.graphCPFemaleAD_6.2019.04.26.11.19.25.txt
(32.1 KiB) Downloaded 45 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

PreviousNext

Return to Class Projects

Who is online

Users browsing this forum: No registered users and 8 guests