GEO datasets

Re: GEO datasets

Postby lsand039 » Fri May 19, 2017 7:00 pm

I've had to go through each of the datasets several times to make sure the right number of genes are included and the calculations are correct. It started to get messy, but I've put the whole process from matching the probe ids to gene names on Access to discretizing the values in two separate files for each dataset. Files ending in CZ include the results from the query searches on Access, consolidating/averaging the gene expression values on the same gene, and finding the z-scores of the values. Files ending in D include the discritization of genes and demographic/clinical variables.
1297.CZ.xlsx
(14.68 MiB) Downloaded 170 times

1297.D.xlsx
(6.72 MiB) Downloaded 151 times

29378.CZ.xlsx
(31.21 MiB) Downloaded 163 times

29378.D.xlsx
(13.69 MiB) Downloaded 169 times

48350.CZ.xlsx
(124.99 MiB) Downloaded 162 times

48350.D.xlsx
(54.59 MiB) Downloaded 166 times

28146.CZ.xlsx
(13.24 MiB) Downloaded 157 times

28146.D.xlsx
(5.9 MiB) Downloaded 167 times

44768.CZ.xlsx
(98.77 MiB) Downloaded 153 times

44768.D.xlsx
(45.47 MiB) Downloaded 166 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 1:29 pm

A continuation of the last post
44770.CZ.xlsx
(98.55 MiB) Downloaded 164 times

44770.D.xlsx
(50.19 MiB) Downloaded 155 times

44771.CZ.xlsx
(98.57 MiB) Downloaded 167 times

44771.D.xlsx
(50.24 MiB) Downloaded 162 times

5281.CZ.xlsx
(80.47 MiB) Downloaded 168 times

5281.D.xlsx
(30.3 MiB) Downloaded 157 times

16759.CZ.xlsx
(8.49 MiB) Downloaded 167 times

16759.D.xlsx
(2.65 MiB) Downloaded 155 times

26927.CZ.xlsx
(19.75 MiB) Downloaded 159 times

26927.D.xlsx
(4.52 MiB) Downloaded 162 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 2:39 pm

A continuation of the last post
15222.CZ.xlsx
(122.52 MiB) Downloaded 146 times

15222.D.xlsx
(74.45 MiB) Downloaded 162 times

84422.96.CZ.xlsx.tar.gz
(391.52 MiB) Downloaded 198 times

84422.96.D.xlsx
(224.08 MiB) Downloaded 189 times

84422.570.CZ.xlsx
(50.91 MiB) Downloaded 151 times

84422.570.D.xlsx
(23.38 MiB) Downloaded 165 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue May 23, 2017 11:35 am

Here are the files that include all the datasets. This is the file I've been using to run BaNJO
Attachments
Combined Datasets.txt
(35.05 MiB) Downloaded 156 times
Combined Datasets.xlsx
(131.99 MiB) Downloaded 179 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:06 pm

Below is an updated list of the GSE datasets from a GEO dataset and series search using the word "Alzheimer". I've only included results under "homo sapiens". The first page include all the datasets listed, but I've removed Super Series results. The Super Series results just contain a list of the subseries which are listed in the first page of the file.
Dataset Updates.xlsx
(145.04 KiB) Downloaded 163 times


The GSE datasets in red do not match the search criteria. They may include samples from cell lines, induced pluripotent stem cells (iPSCs), non-human samples, or may lack AD samples. The entries highlighted in green are already included in the cleaned combined dataset (except for GSE84422, GPL97).

I haven't listed all the numbers for the diseased/ control samples since not all the studies were focused on AD. Some were studies that focused on another disease and the samples also had AD. Also, AD categorization differed among the datasets. Some were categorized with AD pathology but were non-demented, some were categorized as probable AD. I can post the numbers of AD samples once it's decided what samples really count as AD.

In the sheet labeled "Datasets potentially included", the entries in white would still need to be cleaned up but these include brain samples with age, sex, and gene names. The Notes column lists problems I'd run into cleaning these entries. For some, the gene name would be difficult to extract because of the GPL file layout. Other datasets would greatly reduce the number of common genes in the combined dataset.

I think that the studies labeled with "high throughput sequencing" are the deep sequencing studies previously mentioned in the meetings. 17 out of the 131 search results included "high throughput sequencing". These can be found in the last tab. It doesn't look like the gene expression values can be found in the SOFT file or series matrices. There are several supplemental files that are a few large files (a couple of GBs), available. I'd need to try opening them on path 5 because of their sizes
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:43 pm

Here are the BaNJO settings & results for path 2:
1 hour
settings1.txt
(5.8 KiB) Downloaded 161 times

IRun1.graph.2017.05.23.11.32.50.txt
(278.46 KiB) Downloaded 158 times

2 hours
settings2.txt
(5.8 KiB) Downloaded 141 times

IRun2.graph.2017.05.23.12.34.01.txt
(332.85 KiB) Downloaded 130 times

4 hours
settings4.txt
(5.8 KiB) Downloaded 128 times

IRun4.graph.2017.05.23.14.48.04.txt
(356.88 KiB) Downloaded 125 times

8 hours
settings8.txt
(5.8 KiB) Downloaded 130 times

IRun8.graph.2017.05.23.21.07.27.txt
(355.84 KiB) Downloaded 131 times



Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-2 Graph images.tar.gz
(4.45 MiB) Downloaded 121 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Last edited by lsand039 on Tue May 30, 2017 1:02 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:48 pm

Here are the BaNJO settings & results for path 3:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 103 times

IRun1.graph.2017.05.17.14.50.06.txt
(276.17 KiB) Downloaded 104 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 107 times

IRun2.graph.2017.05.23.11.31.52.txt
(331.48 KiB) Downloaded 94 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 94 times

IRun4.graph.2017.05.23.13.44.23.txt
(358.02 KiB) Downloaded 103 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 87 times

IRun8.graph.2017.05.23.20.11.49.txt
(358.84 KiB) Downloaded 91 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-3 Graph images.tar.gz
(4.49 MiB) Downloaded 89 times
Last edited by lsand039 on Tue May 30, 2017 12:51 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:53 pm

Here are the BaNJO settings & results for path 5:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 89 times

IRun1.graph.2017.05.23.11.30.43.txt
(279.13 KiB) Downloaded 90 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 93 times

IRun2.graph.2017.05.23.12.31.57.txt
(332.61 KiB) Downloaded 84 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 81 times

IRun4.graph.2017.05.23.14.32.52.txt
(356.12 KiB) Downloaded 75 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 70 times

IRun8.graph.2017.05.23.18.33.39.txt
(358.29 KiB) Downloaded 72 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-5 Graph images.tar.gz
(4.46 MiB) Downloaded 73 times
Last edited by lsand039 on Tue May 30, 2017 12:40 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 25, 2017 1:14 pm

Here are the list of Markov Blanket genes. The networks with the Markov Blanket genes colored are posted by path number in a compressed file with their settings files and graph txt files. The 1st degree Markov Blanket gene are in pink and the 2nd degree Markov Blanket genes are in orange.
Attachments
MB genes.xlsx
(14.69 KiB) Downloaded 67 times
Last edited by lsand039 on Tue May 30, 2017 1:04 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri May 26, 2017 11:05 am

Here is the first draft of my Methods section. I have some comments with questions on what I should include. Please let me know what revisions I need to make. I'll add more information on Gene Ontology once I get initial results.
Attachments
Methods.docx
Version 1
(207.26 KiB) Downloaded 58 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 1 guest

cron