GEO datasets

Re: GEO datasets

Postby lsand039 » Tue Jun 27, 2017 12:43 pm

I spoke with Efrain and he fixed the problem. The code now picked up on all the samples with missing data. I checked the fixed outputs (the aftexcel, dscrt, and matched files) with the outputs I got in trial 1, and they're identical. It looks like the inclusion of more decimal places from the original series matrix file didn't change the final discretization. Trial 1 has the correct input files and the results should be analyzed instead of trial 2.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed Jun 28, 2017 1:39 pm

I found that the results from the BaNJO runs were missing APP in the 7097 genes. Only GSE15222 was missing APP, but I found this odd since this dataset has previously been included in other analyses. As I was going through files, I found that I could not find the original GPL file for GSE15222 that contained gene names and probe IDs.

I was able to find a list that had the gene names and probe IDs for GSE15222 on this other site:
http://www.chibi.ubc.ca/Gemma/expressio ... ml?id=5643
This is the link where I found the appropriate GPL information: http://www.chibi.ubc.ca/Gemma/arrays/sh ... tml?id=293
This site seemed to have the same information available on GEO (plus the gene name/ probe ID info I couldn't find on GEO), so I assumed it was a legitimate source to get gene names. I downloaded the GPL information as a text file and used this to match the gene names to probe IDs using Efrain's code.

It turns out that the GPL file I downloaded did not have APP listed a gene that had a corresponding probe ID. Today I looked at all the different types of files available for GSE15222's GPL file, and its Annotation Soft Table contained APP and other genes with corresponding probe names that don't seem to be included in the outside link I previously found.

I found the common genes for all 15 datasets again now that they all had APP, and there are currently 8092 genes.

The new GSE15222 aftexcel & dscrt files that include APP:
GSE15222aftexcel.txt
(68.72 MiB) Downloaded 152 times

GSE15222dscrt.txt
(30.81 MiB) Downloaded 167 times

The 15 datasets with 8092 genes:
8092 genes.tar.gz
(50.24 MiB) Downloaded 153 times

I'm currently rerunning BaNJO with this data.
Combined datasets in Excel form & txt form:
7097Combined Datasets.xlsx
(59 MiB) Downloaded 169 times

8096genes.txt
(34.86 MiB) Downloaded 171 times

Settings Files:
settings1.txt
(5.79 KiB) Downloaded 172 times
settings2.txt
(5.79 KiB) Downloaded 159 times
settings4.txt
(5.79 KiB) Downloaded 151 times
settings8.txt
(5.79 KiB) Downloaded 163 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Jun 29, 2017 1:49 pm

Below are the results from the BaNJO run:

The top scoring graph was from Path-2 at 8 hours. It scored significantly better than the 2nd and 3rd best scoring graph (Path-3 at 8 hours and Path-5 at 8 hours, see Best Scores.xlsx).
Dot images.tar.gz
(1.48 MiB) Downloaded 172 times

Best Scores.xlsx
(9.06 KiB) Downloaded 169 times

The top scoring graph had 10 MB genes.
1st degree MB:
ELOVL4
EIF5A2
SRRM2

2nd degree MB:
ATP6V1D
DIRAS2
HOOK1
AZIN1
PGK1
ADCY1
PAX6

Age, Sex, and the genes of interest (APOE, APP, PSEN1, and PSEN2) were all connected to a subnetwork, but it's unclear whether they're part of the same network that hold Alzheimer's.

Since the best structures were found at 8 hours, I went ahead and started a 12 hour BaNJO run on Path 2,3, and 5.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Jun 29, 2017 4:56 pm

This is a general order of the genes & molecules found on the KEGG Alzheimer's pathway (http://www.kegg.jp/kegg-bin/show_pathwa ... ption=show). Nodes in purple are characteristic of Alzheimer's. Nodes in gray are inorganic molecules. Nodes in green are organelles or other cell components. The rectangular nodes with rounded edges are complexes or enzymes made up of different genes.

Please comment any edits.
Attachments
KEGGorder.xdsl
(4.34 MiB) Downloaded 157 times
Last edited by lsand039 on Mon Jul 03, 2017 9:57 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri Jun 30, 2017 9:36 am

After a 12 hour BaNJO run on Paths 2, 3, and 5, the top scoring graph was from Path 3 at 12 hours. This made up 100%, while the graph from Path 2 at 12 hours made a tiny contribution.
Best Scores.xlsx
(6.99 KiB) Downloaded 161 times

I'm getting a message "segmentation fault (core dumped)" when I try to make the image of the full structure on dot. I've tried .svg, .png, and .jpeg on Paths 2, 3, and 4 and have the same results. Below is the MB dot file that I was able to produce. Based of the text file, it seems that the clinical variables and the genes of interest are not standing alone in the full graph. I've also included the dot text file for the full structure.
Hour12-3MB.dot.svg.tar.gz
(2.49 KiB) Downloaded 165 times

Hour12-3.dot
(357 KiB) Downloaded 165 times

1st degree MB:
NCALD
GSTP1
VKORC1

2nd degree MB:
MAOB
CTSC
KYNU
PRPF8
MAF
C16orf45
BHMT2
SERTAD3
POLR3D
ATF1
DPY19L4
NELL2

I am currently running a 16 hour BaNJO run.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sat Jul 01, 2017 8:55 am

The top scoring graph after the 16 hour BaNJO run was from Path 5 at 16 hours. This made 100% of the total score. When this graph was excluded, the Path 2 at 16 hours had the next best score, also 100% of the total score. The 3rd best graph was from Path 3 at 12 hours.
Best Scores.xlsx
(7.58 KiB) Downloaded 166 times

Hour16-5 images.tar.gz
(1.59 MiB) Downloaded 157 times

1st degree MB:
COL1A1
GRAMD1C

2nd degree MB:
CHRNA10
CNOT8
KCNK10
SLPI
I'm currently running BaNJO at 20 hours to see if the scores can get any better.
Last edited by lsand039 on Sun Jul 02, 2017 7:06 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun Jul 02, 2017 3:57 pm

The top 3 scoring graphs after the 20 hour BaNJO run were from the 20 hour runs. The Path 2 had the best score, then Path 3 and finally Path 5 when the higher score was excluded for comparison.
Best Scores.xlsx
(7.75 KiB) Downloaded 170 times

Hour20-2.tar.gz
(1.65 MiB) Downloaded 164 times

The MB for Path 2 at 20 Hours:
1st degree MB:
CDC42EP4
NEO1

2nd degree MB:
HTR2A
GNG12
NDUFB3
MCCC1
PDHX
CXXC4
ACAA2
RGS20
APLP1
MORF4L2
NOL4
TRAFD1
TMEM140

I'm running a 24 hour BaNJO run to see if the scores can get better.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Mon Jul 03, 2017 9:26 pm

Just got the results for the 24 Hour run. The top scoring graph was still from Path-2 Hour 20 which was significantly better (100%) than the next highest scoring structure from Path 3 Hour 24. This was still significantly better than the 3rd top scoring graph which is from Path 3 Hour 20.
Best Scores.xlsx
(7.49 KiB) Downloaded 171 times


Since I'll be gone for the 4th, I'm letting BaNJo run for 36 and 48 hours just in case those produce better scores.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu Jul 06, 2017 2:21 pm

The 36 Hour run had significantly better scores than the previously best score at Path 2 Hour 20. The best scoring structure is now from Path 3 Hour 36, then Path 5 Hour 36, Path 2 Hour 36, and Path 2 Hour 20.
Best Scores.xlsx
(8.49 KiB) Downloaded 152 times

There were 50 MB genes in the best structure.
Hour 36-3 images.tar.gz
(1.59 MiB) Downloaded 159 times


I'm currently waiting for the 48 hour run to finish.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri Jul 07, 2017 1:12 pm

The 48 Hour BaNJO run at Path 3 gave the highest score and was significantly better (100%) than the next top scoring graph at Path 3 Hour 36.
Best Scores.xlsx
(8.1 KiB) Downloaded 163 times

The following are the MB genes in this structure are:

1st degree:
AACS
VRK1

2nd degree:
CREB3L2
MCHR1
DEPDC5
NKG7
TNFRSF11A
NKG7
BRSK2
SLC12A7
C12orf43
IFT81
CDH8
CTBP2

Here are the MB genes for all the BaNJO runs done so far:
MB .xlsx
(47.51 KiB) Downloaded 153 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 0 guests

cron