GEO datasets

Re: GEO datasets

Postby lsand039 » Fri May 19, 2017 7:00 pm

I've had to go through each of the datasets several times to make sure the right number of genes are included and the calculations are correct. It started to get messy, but I've put the whole process from matching the probe ids to gene names on Access to discretizing the values in two separate files for each dataset. Files ending in CZ include the results from the query searches on Access, consolidating/averaging the gene expression values on the same gene, and finding the z-scores of the values. Files ending in D include the discritization of genes and demographic/clinical variables.
1297.CZ.xlsx
(14.68 MiB) Downloaded 159 times

1297.D.xlsx
(6.72 MiB) Downloaded 139 times

29378.CZ.xlsx
(31.21 MiB) Downloaded 153 times

29378.D.xlsx
(13.69 MiB) Downloaded 157 times

48350.CZ.xlsx
(124.99 MiB) Downloaded 150 times

48350.D.xlsx
(54.59 MiB) Downloaded 156 times

28146.CZ.xlsx
(13.24 MiB) Downloaded 145 times

28146.D.xlsx
(5.9 MiB) Downloaded 154 times

44768.CZ.xlsx
(98.77 MiB) Downloaded 141 times

44768.D.xlsx
(45.47 MiB) Downloaded 154 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 1:29 pm

A continuation of the last post
44770.CZ.xlsx
(98.55 MiB) Downloaded 152 times

44770.D.xlsx
(50.19 MiB) Downloaded 143 times

44771.CZ.xlsx
(98.57 MiB) Downloaded 155 times

44771.D.xlsx
(50.24 MiB) Downloaded 151 times

5281.CZ.xlsx
(80.47 MiB) Downloaded 155 times

5281.D.xlsx
(30.3 MiB) Downloaded 144 times

16759.CZ.xlsx
(8.49 MiB) Downloaded 157 times

16759.D.xlsx
(2.65 MiB) Downloaded 142 times

26927.CZ.xlsx
(19.75 MiB) Downloaded 147 times

26927.D.xlsx
(4.52 MiB) Downloaded 151 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 2:39 pm

A continuation of the last post
15222.CZ.xlsx
(122.52 MiB) Downloaded 134 times

15222.D.xlsx
(74.45 MiB) Downloaded 148 times

84422.96.CZ.xlsx.tar.gz
(391.52 MiB) Downloaded 159 times

84422.96.D.xlsx
(224.08 MiB) Downloaded 149 times

84422.570.CZ.xlsx
(50.91 MiB) Downloaded 138 times

84422.570.D.xlsx
(23.38 MiB) Downloaded 152 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue May 23, 2017 11:35 am

Here are the files that include all the datasets. This is the file I've been using to run BaNJO
Attachments
Combined Datasets.txt
(35.05 MiB) Downloaded 143 times
Combined Datasets.xlsx
(131.99 MiB) Downloaded 167 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:06 pm

Below is an updated list of the GSE datasets from a GEO dataset and series search using the word "Alzheimer". I've only included results under "homo sapiens". The first page include all the datasets listed, but I've removed Super Series results. The Super Series results just contain a list of the subseries which are listed in the first page of the file.
Dataset Updates.xlsx
(145.04 KiB) Downloaded 152 times


The GSE datasets in red do not match the search criteria. They may include samples from cell lines, induced pluripotent stem cells (iPSCs), non-human samples, or may lack AD samples. The entries highlighted in green are already included in the cleaned combined dataset (except for GSE84422, GPL97).

I haven't listed all the numbers for the diseased/ control samples since not all the studies were focused on AD. Some were studies that focused on another disease and the samples also had AD. Also, AD categorization differed among the datasets. Some were categorized with AD pathology but were non-demented, some were categorized as probable AD. I can post the numbers of AD samples once it's decided what samples really count as AD.

In the sheet labeled "Datasets potentially included", the entries in white would still need to be cleaned up but these include brain samples with age, sex, and gene names. The Notes column lists problems I'd run into cleaning these entries. For some, the gene name would be difficult to extract because of the GPL file layout. Other datasets would greatly reduce the number of common genes in the combined dataset.

I think that the studies labeled with "high throughput sequencing" are the deep sequencing studies previously mentioned in the meetings. 17 out of the 131 search results included "high throughput sequencing". These can be found in the last tab. It doesn't look like the gene expression values can be found in the SOFT file or series matrices. There are several supplemental files that are a few large files (a couple of GBs), available. I'd need to try opening them on path 5 because of their sizes
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:43 pm

Here are the BaNJO settings & results for path 2:
1 hour
settings1.txt
(5.8 KiB) Downloaded 149 times

IRun1.graph.2017.05.23.11.32.50.txt
(278.46 KiB) Downloaded 146 times

2 hours
settings2.txt
(5.8 KiB) Downloaded 129 times

IRun2.graph.2017.05.23.12.34.01.txt
(332.85 KiB) Downloaded 118 times

4 hours
settings4.txt
(5.8 KiB) Downloaded 117 times

IRun4.graph.2017.05.23.14.48.04.txt
(356.88 KiB) Downloaded 113 times

8 hours
settings8.txt
(5.8 KiB) Downloaded 118 times

IRun8.graph.2017.05.23.21.07.27.txt
(355.84 KiB) Downloaded 119 times



Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-2 Graph images.tar.gz
(4.45 MiB) Downloaded 109 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Last edited by lsand039 on Tue May 30, 2017 1:02 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:48 pm

Here are the BaNJO settings & results for path 3:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 91 times

IRun1.graph.2017.05.17.14.50.06.txt
(276.17 KiB) Downloaded 93 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 94 times

IRun2.graph.2017.05.23.11.31.52.txt
(331.48 KiB) Downloaded 81 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 81 times

IRun4.graph.2017.05.23.13.44.23.txt
(358.02 KiB) Downloaded 91 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 75 times

IRun8.graph.2017.05.23.20.11.49.txt
(358.84 KiB) Downloaded 79 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-3 Graph images.tar.gz
(4.49 MiB) Downloaded 76 times
Last edited by lsand039 on Tue May 30, 2017 12:51 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:53 pm

Here are the BaNJO settings & results for path 5:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 77 times

IRun1.graph.2017.05.23.11.30.43.txt
(279.13 KiB) Downloaded 77 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 80 times

IRun2.graph.2017.05.23.12.31.57.txt
(332.61 KiB) Downloaded 72 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 68 times

IRun4.graph.2017.05.23.14.32.52.txt
(356.12 KiB) Downloaded 63 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 58 times

IRun8.graph.2017.05.23.18.33.39.txt
(358.29 KiB) Downloaded 60 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-5 Graph images.tar.gz
(4.46 MiB) Downloaded 61 times
Last edited by lsand039 on Tue May 30, 2017 12:40 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 25, 2017 1:14 pm

Here are the list of Markov Blanket genes. The networks with the Markov Blanket genes colored are posted by path number in a compressed file with their settings files and graph txt files. The 1st degree Markov Blanket gene are in pink and the 2nd degree Markov Blanket genes are in orange.
Attachments
MB genes.xlsx
(14.69 KiB) Downloaded 55 times
Last edited by lsand039 on Tue May 30, 2017 1:04 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri May 26, 2017 11:05 am

Here is the first draft of my Methods section. I have some comments with questions on what I should include. Please let me know what revisions I need to make. I'll add more information on Gene Ontology once I get initial results.
Attachments
Methods.docx
Version 1
(207.26 KiB) Downloaded 45 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 1 guest

cron