GEO datasets

Re: GEO datasets

Postby lsand039 » Fri May 19, 2017 7:00 pm

I've had to go through each of the datasets several times to make sure the right number of genes are included and the calculations are correct. It started to get messy, but I've put the whole process from matching the probe ids to gene names on Access to discretizing the values in two separate files for each dataset. Files ending in CZ include the results from the query searches on Access, consolidating/averaging the gene expression values on the same gene, and finding the z-scores of the values. Files ending in D include the discritization of genes and demographic/clinical variables.
1297.CZ.xlsx
(14.68 MiB) Downloaded 190 times

1297.D.xlsx
(6.72 MiB) Downloaded 172 times

29378.CZ.xlsx
(31.21 MiB) Downloaded 184 times

29378.D.xlsx
(13.69 MiB) Downloaded 191 times

48350.CZ.xlsx
(124.99 MiB) Downloaded 185 times

48350.D.xlsx
(54.59 MiB) Downloaded 186 times

28146.CZ.xlsx
(13.24 MiB) Downloaded 178 times

28146.D.xlsx
(5.9 MiB) Downloaded 187 times

44768.CZ.xlsx
(98.77 MiB) Downloaded 183 times

44768.D.xlsx
(45.47 MiB) Downloaded 193 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 1:29 pm

A continuation of the last post
44770.CZ.xlsx
(98.55 MiB) Downloaded 186 times

44770.D.xlsx
(50.19 MiB) Downloaded 179 times

44771.CZ.xlsx
(98.57 MiB) Downloaded 195 times

44771.D.xlsx
(50.24 MiB) Downloaded 185 times

5281.CZ.xlsx
(80.47 MiB) Downloaded 192 times

5281.D.xlsx
(30.3 MiB) Downloaded 178 times

16759.CZ.xlsx
(8.49 MiB) Downloaded 187 times

16759.D.xlsx
(2.65 MiB) Downloaded 176 times

26927.CZ.xlsx
(19.75 MiB) Downloaded 180 times

26927.D.xlsx
(4.52 MiB) Downloaded 185 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 2:39 pm

A continuation of the last post
15222.CZ.xlsx
(122.52 MiB) Downloaded 169 times

15222.D.xlsx
(74.45 MiB) Downloaded 184 times

84422.96.CZ.xlsx.tar.gz
(391.52 MiB) Downloaded 221 times

84422.96.D.xlsx
(224.08 MiB) Downloaded 214 times

84422.570.CZ.xlsx
(50.91 MiB) Downloaded 175 times

84422.570.D.xlsx
(23.38 MiB) Downloaded 190 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue May 23, 2017 11:35 am

Here are the files that include all the datasets. This is the file I've been using to run BaNJO
Attachments
Combined Datasets.txt
(35.05 MiB) Downloaded 179 times
Combined Datasets.xlsx
(131.99 MiB) Downloaded 201 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:06 pm

Below is an updated list of the GSE datasets from a GEO dataset and series search using the word "Alzheimer". I've only included results under "homo sapiens". The first page include all the datasets listed, but I've removed Super Series results. The Super Series results just contain a list of the subseries which are listed in the first page of the file.
Dataset Updates.xlsx
(145.04 KiB) Downloaded 185 times


The GSE datasets in red do not match the search criteria. They may include samples from cell lines, induced pluripotent stem cells (iPSCs), non-human samples, or may lack AD samples. The entries highlighted in green are already included in the cleaned combined dataset (except for GSE84422, GPL97).

I haven't listed all the numbers for the diseased/ control samples since not all the studies were focused on AD. Some were studies that focused on another disease and the samples also had AD. Also, AD categorization differed among the datasets. Some were categorized with AD pathology but were non-demented, some were categorized as probable AD. I can post the numbers of AD samples once it's decided what samples really count as AD.

In the sheet labeled "Datasets potentially included", the entries in white would still need to be cleaned up but these include brain samples with age, sex, and gene names. The Notes column lists problems I'd run into cleaning these entries. For some, the gene name would be difficult to extract because of the GPL file layout. Other datasets would greatly reduce the number of common genes in the combined dataset.

I think that the studies labeled with "high throughput sequencing" are the deep sequencing studies previously mentioned in the meetings. 17 out of the 131 search results included "high throughput sequencing". These can be found in the last tab. It doesn't look like the gene expression values can be found in the SOFT file or series matrices. There are several supplemental files that are a few large files (a couple of GBs), available. I'd need to try opening them on path 5 because of their sizes
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:43 pm

Here are the BaNJO settings & results for path 2:
1 hour
settings1.txt
(5.8 KiB) Downloaded 182 times

IRun1.graph.2017.05.23.11.32.50.txt
(278.46 KiB) Downloaded 180 times

2 hours
settings2.txt
(5.8 KiB) Downloaded 165 times

IRun2.graph.2017.05.23.12.34.01.txt
(332.85 KiB) Downloaded 152 times

4 hours
settings4.txt
(5.8 KiB) Downloaded 151 times

IRun4.graph.2017.05.23.14.48.04.txt
(356.88 KiB) Downloaded 147 times

8 hours
settings8.txt
(5.8 KiB) Downloaded 152 times

IRun8.graph.2017.05.23.21.07.27.txt
(355.84 KiB) Downloaded 154 times



Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-2 Graph images.tar.gz
(4.45 MiB) Downloaded 143 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Last edited by lsand039 on Tue May 30, 2017 1:02 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:48 pm

Here are the BaNJO settings & results for path 3:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 127 times

IRun1.graph.2017.05.17.14.50.06.txt
(276.17 KiB) Downloaded 126 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 128 times

IRun2.graph.2017.05.23.11.31.52.txt
(331.48 KiB) Downloaded 114 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 117 times

IRun4.graph.2017.05.23.13.44.23.txt
(358.02 KiB) Downloaded 124 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 109 times

IRun8.graph.2017.05.23.20.11.49.txt
(358.84 KiB) Downloaded 113 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-3 Graph images.tar.gz
(4.49 MiB) Downloaded 109 times
Last edited by lsand039 on Tue May 30, 2017 12:51 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:53 pm

Here are the BaNJO settings & results for path 5:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 110 times

IRun1.graph.2017.05.23.11.30.43.txt
(279.13 KiB) Downloaded 111 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 113 times

IRun2.graph.2017.05.23.12.31.57.txt
(332.61 KiB) Downloaded 108 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 102 times

IRun4.graph.2017.05.23.14.32.52.txt
(356.12 KiB) Downloaded 96 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 91 times

IRun8.graph.2017.05.23.18.33.39.txt
(358.29 KiB) Downloaded 95 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-5 Graph images.tar.gz
(4.46 MiB) Downloaded 94 times
Last edited by lsand039 on Tue May 30, 2017 12:40 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 25, 2017 1:14 pm

Here are the list of Markov Blanket genes. The networks with the Markov Blanket genes colored are posted by path number in a compressed file with their settings files and graph txt files. The 1st degree Markov Blanket gene are in pink and the 2nd degree Markov Blanket genes are in orange.
Attachments
MB genes.xlsx
(14.69 KiB) Downloaded 88 times
Last edited by lsand039 on Tue May 30, 2017 1:04 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri May 26, 2017 11:05 am

Here is the first draft of my Methods section. I have some comments with questions on what I should include. Please let me know what revisions I need to make. I'll add more information on Gene Ontology once I get initial results.
Attachments
Methods.docx
Version 1
(207.26 KiB) Downloaded 79 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 0 guests