GEO datasets

Re: GEO datasets

Postby lsand039 » Fri May 19, 2017 7:00 pm

I've had to go through each of the datasets several times to make sure the right number of genes are included and the calculations are correct. It started to get messy, but I've put the whole process from matching the probe ids to gene names on Access to discretizing the values in two separate files for each dataset. Files ending in CZ include the results from the query searches on Access, consolidating/averaging the gene expression values on the same gene, and finding the z-scores of the values. Files ending in D include the discritization of genes and demographic/clinical variables.
1297.CZ.xlsx
(14.68 MiB) Downloaded 160 times

1297.D.xlsx
(6.72 MiB) Downloaded 140 times

29378.CZ.xlsx
(31.21 MiB) Downloaded 154 times

29378.D.xlsx
(13.69 MiB) Downloaded 158 times

48350.CZ.xlsx
(124.99 MiB) Downloaded 151 times

48350.D.xlsx
(54.59 MiB) Downloaded 157 times

28146.CZ.xlsx
(13.24 MiB) Downloaded 146 times

28146.D.xlsx
(5.9 MiB) Downloaded 155 times

44768.CZ.xlsx
(98.77 MiB) Downloaded 143 times

44768.D.xlsx
(45.47 MiB) Downloaded 155 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 1:29 pm

A continuation of the last post
44770.CZ.xlsx
(98.55 MiB) Downloaded 153 times

44770.D.xlsx
(50.19 MiB) Downloaded 144 times

44771.CZ.xlsx
(98.57 MiB) Downloaded 156 times

44771.D.xlsx
(50.24 MiB) Downloaded 152 times

5281.CZ.xlsx
(80.47 MiB) Downloaded 156 times

5281.D.xlsx
(30.3 MiB) Downloaded 145 times

16759.CZ.xlsx
(8.49 MiB) Downloaded 158 times

16759.D.xlsx
(2.65 MiB) Downloaded 143 times

26927.CZ.xlsx
(19.75 MiB) Downloaded 148 times

26927.D.xlsx
(4.52 MiB) Downloaded 152 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Sun May 21, 2017 2:39 pm

A continuation of the last post
15222.CZ.xlsx
(122.52 MiB) Downloaded 135 times

15222.D.xlsx
(74.45 MiB) Downloaded 149 times

84422.96.CZ.xlsx.tar.gz
(391.52 MiB) Downloaded 160 times

84422.96.D.xlsx
(224.08 MiB) Downloaded 150 times

84422.570.CZ.xlsx
(50.91 MiB) Downloaded 139 times

84422.570.D.xlsx
(23.38 MiB) Downloaded 154 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Tue May 23, 2017 11:35 am

Here are the files that include all the datasets. This is the file I've been using to run BaNJO
Attachments
Combined Datasets.txt
(35.05 MiB) Downloaded 144 times
Combined Datasets.xlsx
(131.99 MiB) Downloaded 168 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:06 pm

Below is an updated list of the GSE datasets from a GEO dataset and series search using the word "Alzheimer". I've only included results under "homo sapiens". The first page include all the datasets listed, but I've removed Super Series results. The Super Series results just contain a list of the subseries which are listed in the first page of the file.
Dataset Updates.xlsx
(145.04 KiB) Downloaded 153 times


The GSE datasets in red do not match the search criteria. They may include samples from cell lines, induced pluripotent stem cells (iPSCs), non-human samples, or may lack AD samples. The entries highlighted in green are already included in the cleaned combined dataset (except for GSE84422, GPL97).

I haven't listed all the numbers for the diseased/ control samples since not all the studies were focused on AD. Some were studies that focused on another disease and the samples also had AD. Also, AD categorization differed among the datasets. Some were categorized with AD pathology but were non-demented, some were categorized as probable AD. I can post the numbers of AD samples once it's decided what samples really count as AD.

In the sheet labeled "Datasets potentially included", the entries in white would still need to be cleaned up but these include brain samples with age, sex, and gene names. The Notes column lists problems I'd run into cleaning these entries. For some, the gene name would be difficult to extract because of the GPL file layout. Other datasets would greatly reduce the number of common genes in the combined dataset.

I think that the studies labeled with "high throughput sequencing" are the deep sequencing studies previously mentioned in the meetings. 17 out of the 131 search results included "high throughput sequencing". These can be found in the last tab. It doesn't look like the gene expression values can be found in the SOFT file or series matrices. There are several supplemental files that are a few large files (a couple of GBs), available. I'd need to try opening them on path 5 because of their sizes
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:43 pm

Here are the BaNJO settings & results for path 2:
1 hour
settings1.txt
(5.8 KiB) Downloaded 150 times

IRun1.graph.2017.05.23.11.32.50.txt
(278.46 KiB) Downloaded 147 times

2 hours
settings2.txt
(5.8 KiB) Downloaded 131 times

IRun2.graph.2017.05.23.12.34.01.txt
(332.85 KiB) Downloaded 119 times

4 hours
settings4.txt
(5.8 KiB) Downloaded 118 times

IRun4.graph.2017.05.23.14.48.04.txt
(356.88 KiB) Downloaded 114 times

8 hours
settings8.txt
(5.8 KiB) Downloaded 120 times

IRun8.graph.2017.05.23.21.07.27.txt
(355.84 KiB) Downloaded 121 times



Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-2 Graph images.tar.gz
(4.45 MiB) Downloaded 110 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Last edited by lsand039 on Tue May 30, 2017 1:02 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:48 pm

Here are the BaNJO settings & results for path 3:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 92 times

IRun1.graph.2017.05.17.14.50.06.txt
(276.17 KiB) Downloaded 94 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 95 times

IRun2.graph.2017.05.23.11.31.52.txt
(331.48 KiB) Downloaded 82 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 82 times

IRun4.graph.2017.05.23.13.44.23.txt
(358.02 KiB) Downloaded 92 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 76 times

IRun8.graph.2017.05.23.20.11.49.txt
(358.84 KiB) Downloaded 80 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-3 Graph images.tar.gz
(4.49 MiB) Downloaded 77 times
Last edited by lsand039 on Tue May 30, 2017 12:51 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Wed May 24, 2017 4:53 pm

Here are the BaNJO settings & results for path 5:
1 hour:
settings1.txt
(5.8 KiB) Downloaded 78 times

IRun1.graph.2017.05.23.11.30.43.txt
(279.13 KiB) Downloaded 78 times

2 hours:
settings2.txt
(5.8 KiB) Downloaded 81 times

IRun2.graph.2017.05.23.12.31.57.txt
(332.61 KiB) Downloaded 73 times

4 hours:
settings4.txt
(5.8 KiB) Downloaded 69 times

IRun4.graph.2017.05.23.14.32.52.txt
(356.12 KiB) Downloaded 64 times

8 hours:
settings8.txt
(5.8 KiB) Downloaded 59 times

IRun8.graph.2017.05.23.18.33.39.txt
(358.29 KiB) Downloaded 61 times


Images of the structures had to be compressed since they could only be viewed in the svg format:
Path-5 Graph images.tar.gz
(4.46 MiB) Downloaded 62 times
Last edited by lsand039 on Tue May 30, 2017 12:40 pm, edited 1 time in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Thu May 25, 2017 1:14 pm

Here are the list of Markov Blanket genes. The networks with the Markov Blanket genes colored are posted by path number in a compressed file with their settings files and graph txt files. The 1st degree Markov Blanket gene are in pink and the 2nd degree Markov Blanket genes are in orange.
Attachments
MB genes.xlsx
(14.69 KiB) Downloaded 56 times
Last edited by lsand039 on Tue May 30, 2017 1:04 pm, edited 2 times in total.
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

Re: GEO datasets

Postby lsand039 » Fri May 26, 2017 11:05 am

Here is the first draft of my Methods section. I have some comments with questions on what I should include. Please let me know what revisions I need to make. I'll add more information on Gene Ontology once I get initial results.
Attachments
Methods.docx
Version 1
(207.26 KiB) Downloaded 47 times
lsand039
 
Posts: 237
Joined: Thu Jan 14, 2016 12:17 pm

PreviousNext

Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 0 guests

cron