SMLG (Statistical Machine Learning Group) Discussion Forum

by **lsand039** » Mon Apr 04, 2016 12:50 pm

So I finally had the chance to run the gene-probe_id.py script with GSE29378.csv and Probe IDs and Gene Names.csv, but I'm getting a blank output. Attached are the files I used. Do I need to change the layout of GSE29378 to get results?

by **lsand039** » Thu Apr 07, 2016 12:56 pm

I just finished matching the probe IDs and gene names to GSE29378 using the script after I figured out what I was doing wrong. I made a new key of gene names and probe IDs that's more comprehensive than the first one I posted. The other gene/ probe ID names were taken directly from the original GSE files and added to my original probe ID and gene name key created from the GDS810 & 4136 probe IDs and gene names and Illumina key found on http://www.genomequebec.mcgill.ca/compg ... robes.html. Attached is the new key and the matched file of GSE29378 along with the work done to make sure the genes were correctly matched to the right probe IDs. I still have 2 more files to match, so those should be up soon!

by **lsand039** » Wed Apr 13, 2016 12:27 pm

Attached are the final two datasets with their matched genes.

by **lsand039** » Thu Apr 21, 2016 12:58 pm

Here are the normalized values and correlation calculations for GSE29378 & GSE48350. I had to make separate files for the normalizations & calculations and separate the Alzheimer and control samples from GSE48350 because the files were getting too big to open.

I combined all the correlation values of all the genes from GDS810, GDS4136, GSE29378, and GSE48350 and found 20 genes with the highest correlations that could be found in all the datasets. All samples from these datasets came from brain samples. I did not include GSE63060 yet in the correlation calculations because all those samples got data from blood.

The data from GDS810, GDS4136, GSE29378, and GSE48350 are just about ready to be analyzed by Banjo. I read the user guide, but I have not yet had the chance to try it out.

by **lsand039** » Wed Apr 27, 2016 1:11 pm

So I've made a list of all the genes common in the 4 datasets. I'm now trying to create files of each dataset that only contain the common genes. I think using Steve's script would work, but I'm running into some issues. I tried using CG810.csv as a key to match all the demographic data and expression values of dataset GDS810 to the file Common Genes.csv. When I run the script, all I get is a black output, and I'm not sure why. Is there anything wrong with the format of my files? The script seems to be running fine; I just don't know why it's been giving me a black output.

by **lsand039** » Tue May 03, 2016 9:49 am

Here are the datasets with the matched genes. There are 11,272 common genes in all 4 datasets. In total, there are 377 samples: 154 AD & 223 controlled.

by **lsand039** » Wed May 04, 2016 10:31 am

Here are the descritized files for the 4 datasets.

by **lsand039** » Thu Jun 02, 2016 12:54 pm

I combined the 4 datasets into one file since Banjo wouldn't let me both specify multiple observation files and specify variable names inFile. I wasn't sure how I should discretize ages. I'm still playing around with some settings to convince Banjo that it can analyze all 377 subjects and 11,275 variables, but I'm hoping Jairo will email me back soon to help me with a solution.

If I had less variables, Banjo would let me run the data; I'd just need to know how I should determine which variables to include. Once I can get Banjo to analyze this dataset, I think we'll be ready to move on to the next step.

by **lsand039** » Mon Jun 06, 2016 2:05 pm

I was able to run Banjo with all the variables & observations by setting the cache to fastLevel1, precomputeLogGamma to no, and Proposer to RandomLocalMove.
Here are the settings and output files of the 1 hr run. I couldn't upload the .svg file that allows me to view the network, and the jpg image won't show any of the nodes. I think that if I reduce the number of variables, I can get a viewable jpg image; I just need to know how to choose which variables I should remove from the search.

by **lsand039** » Thu Jun 09, 2016 9:00 am

I found the correlations of each variable to Alzheimer's. Attached is the file that contains all the variables reorganized them from highest to lowest correlations.

I ran 1 hour Banjo searches for the top 50,100, 250, 500, 1000, 2500, & 5000 variables; and I've included the settings I used for all searches.

SMLG (Statistical Machine Learning Group) Discussion Forum

GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Who is online