SMLG (Statistical Machine Learning Group) Discussion Forum

by **lsand039** » Fri May 19, 2017 7:00 pm

I've had to go through each of the datasets several times to make sure the right number of genes are included and the calculations are correct. It started to get messy, but I've put the whole process from matching the probe ids to gene names on Access to discretizing the values in two separate files for each dataset. Files ending in CZ include the results from the query searches on Access, consolidating/averaging the gene expression values on the same gene, and finding the z-scores of the values. Files ending in D include the discritization of genes and demographic/clinical variables.

1297.CZ.xlsx: (14.68 MiB) Downloaded 199 times

1297.D.xlsx: (6.72 MiB) Downloaded 182 times

29378.CZ.xlsx: (31.21 MiB) Downloaded 196 times

29378.D.xlsx: (13.69 MiB) Downloaded 203 times

48350.CZ.xlsx: (124.99 MiB) Downloaded 193 times

48350.D.xlsx: (54.59 MiB) Downloaded 206 times

28146.CZ.xlsx: (13.24 MiB) Downloaded 188 times

28146.D.xlsx: (5.9 MiB) Downloaded 197 times

44768.CZ.xlsx: (98.77 MiB) Downloaded 194 times

44768.D.xlsx: (45.47 MiB) Downloaded 202 times

by **lsand039** » Sun May 21, 2017 1:29 pm

A continuation of the last post

44770.CZ.xlsx: (98.55 MiB) Downloaded 203 times

44770.D.xlsx: (50.19 MiB) Downloaded 189 times

44771.CZ.xlsx: (98.57 MiB) Downloaded 212 times

44771.D.xlsx: (50.24 MiB) Downloaded 197 times

5281.CZ.xlsx: (80.47 MiB) Downloaded 203 times

5281.D.xlsx: (30.3 MiB) Downloaded 189 times

16759.CZ.xlsx: (8.49 MiB) Downloaded 199 times

16759.D.xlsx: (2.65 MiB) Downloaded 187 times

26927.CZ.xlsx: (19.75 MiB) Downloaded 190 times

26927.D.xlsx: (4.52 MiB) Downloaded 195 times

by **lsand039** » Sun May 21, 2017 2:39 pm

A continuation of the last post

15222.CZ.xlsx: (122.52 MiB) Downloaded 182 times

15222.D.xlsx: (74.45 MiB) Downloaded 198 times

84422.96.CZ.xlsx.tar.gz: (391.52 MiB) Downloaded 234 times

84422.96.D.xlsx: (224.08 MiB) Downloaded 224 times

84422.570.CZ.xlsx: (50.91 MiB) Downloaded 188 times

84422.570.D.xlsx: (23.38 MiB) Downloaded 200 times

by **lsand039** » Tue May 23, 2017 11:35 am

Here are the files that include all the datasets. This is the file I've been using to run BaNJO

by **lsand039** » Wed May 24, 2017 4:06 pm

Below is an updated list of the GSE datasets from a GEO dataset and series search using the word "Alzheimer". I've only included results under "homo sapiens". The first page include all the datasets listed, but I've removed Super Series results. The Super Series results just contain a list of the subseries which are listed in the first page of the file.

Dataset Updates.xlsx: (145.04 KiB) Downloaded 195 times

The GSE datasets in red do not match the search criteria. They may include samples from cell lines, induced pluripotent stem cells (iPSCs), non-human samples, or may lack AD samples. The entries highlighted in green are already included in the cleaned combined dataset (except for GSE84422, GPL97).

I haven't listed all the numbers for the diseased/ control samples since not all the studies were focused on AD. Some were studies that focused on another disease and the samples also had AD. Also, AD categorization differed among the datasets. Some were categorized with AD pathology but were non-demented, some were categorized as probable AD. I can post the numbers of AD samples once it's decided what samples really count as AD.

In the sheet labeled "Datasets potentially included", the entries in white would still need to be cleaned up but these include brain samples with age, sex, and gene names. The Notes column lists problems I'd run into cleaning these entries. For some, the gene name would be difficult to extract because of the GPL file layout. Other datasets would greatly reduce the number of common genes in the combined dataset.

I think that the studies labeled with "high throughput sequencing" are the deep sequencing studies previously mentioned in the meetings. 17 out of the 131 search results included "high throughput sequencing". These can be found in the last tab. It doesn't look like the gene expression values can be found in the SOFT file or series matrices. There are several supplemental files that are a few large files (a couple of GBs), available. I'd need to try opening them on path 5 because of their sizes

by **lsand039** » Wed May 24, 2017 4:43 pm

Here are the BaNJO settings & results for path 2:
1 hour

settings1.txt: (5.8 KiB) Downloaded 192 times

IRun1.graph.2017.05.23.11.32.50.txt: (278.46 KiB) Downloaded 190 times

2 hours

settings2.txt: (5.8 KiB) Downloaded 174 times

IRun2.graph.2017.05.23.12.34.01.txt: (332.85 KiB) Downloaded 163 times

4 hours

settings4.txt: (5.8 KiB) Downloaded 161 times

IRun4.graph.2017.05.23.14.48.04.txt: (356.88 KiB) Downloaded 157 times

8 hours

settings8.txt: (5.8 KiB) Downloaded 160 times

IRun8.graph.2017.05.23.21.07.27.txt: (355.84 KiB) Downloaded 163 times

Images of the structures had to be compressed since they could only be viewed in the svg format:

Path-2 Graph images.tar.gz: (4.45 MiB) Downloaded 153 times

Images of the structures had to be compressed since they could only be viewed in the svg format:

by **lsand039** » Wed May 24, 2017 4:48 pm

Here are the BaNJO settings & results for path 3:
1 hour:

settings1.txt: (5.8 KiB) Downloaded 137 times

IRun1.graph.2017.05.17.14.50.06.txt: (276.17 KiB) Downloaded 136 times

2 hours:

settings2.txt: (5.8 KiB) Downloaded 137 times

IRun2.graph.2017.05.23.11.31.52.txt: (331.48 KiB) Downloaded 123 times

4 hours:

settings4.txt: (5.8 KiB) Downloaded 127 times

IRun4.graph.2017.05.23.13.44.23.txt: (358.02 KiB) Downloaded 134 times

8 hours:

settings8.txt: (5.8 KiB) Downloaded 119 times

IRun8.graph.2017.05.23.20.11.49.txt: (358.84 KiB) Downloaded 123 times

Images of the structures had to be compressed since they could only be viewed in the svg format:

Path-3 Graph images.tar.gz: (4.49 MiB) Downloaded 121 times

by **lsand039** » Wed May 24, 2017 4:53 pm

Here are the BaNJO settings & results for path 5:
1 hour:

settings1.txt: (5.8 KiB) Downloaded 120 times

IRun1.graph.2017.05.23.11.30.43.txt: (279.13 KiB) Downloaded 121 times

2 hours:

settings2.txt: (5.8 KiB) Downloaded 124 times

IRun2.graph.2017.05.23.12.31.57.txt: (332.61 KiB) Downloaded 118 times

4 hours:

settings4.txt: (5.8 KiB) Downloaded 112 times

IRun4.graph.2017.05.23.14.32.52.txt: (356.12 KiB) Downloaded 107 times

8 hours:

settings8.txt: (5.8 KiB) Downloaded 102 times

IRun8.graph.2017.05.23.18.33.39.txt: (358.29 KiB) Downloaded 105 times

Images of the structures had to be compressed since they could only be viewed in the svg format:

Path-5 Graph images.tar.gz: (4.46 MiB) Downloaded 104 times

by **lsand039** » Thu May 25, 2017 1:14 pm

Here are the list of Markov Blanket genes. The networks with the Markov Blanket genes colored are posted by path number in a compressed file with their settings files and graph txt files. The 1st degree Markov Blanket gene are in pink and the 2nd degree Markov Blanket genes are in orange.

by **lsand039** » Fri May 26, 2017 11:05 am

Here is the first draft of my Methods section. I have some comments with questions on what I should include. Please let me know what revisions I need to make. I'll add more information on Gene Ontology once I get initial results.

SMLG (Statistical Machine Learning Group) Discussion Forum

GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Who is online