Here is a table of the 12 datasets I plan to be using. They have 8257 genes in common.

- Dataset Summary.png (44.77 KiB) Viewed 38941 times
The GSE # refers to their GEO accession number. GSE84422 used two platforms, GPL96 and GPL570. I only counted the samples that were definitively AD and controls. I still need to clean up GSE48350 using Base/ Access, but right now I'm having issues opening the file on either of those.
To find out how much of each data set was included in the list of common genes, I went to the list of genes in the GPL file. The column labled "Original # of genes in GPL" refers to the number of genes I found in the GPL file. The number within the parentheses is the GPL#.
Not all the genes in the GPL file are always shown in the GSE dataset. Because multiple probe IDs can match with the same gene, I couldn't directly determine how many genes were available in each dataset. I could find out using Base or Access, but I'm running into a couple issues. Base needs Java Runtime Environment which doesn't seem to be installed in Path 3 and maybe Path 5. Java Runtime Environment is installed in Path 4, but Base keeps freezing up. I think it's because of the size of the files I'm using.
I think Access lets me work with larger files, but the Virtual Machine on Path-3 is too low on disk space. I've tried to increase the memory and delete any unnecessary files, but I can't get enough free space to open my files. I will also eventually need Excel so I can include all 2221 samples during a BaNJO run. LibreOffice Calc has a 1024 column limit, so there won't be enough room to format the data in either variables as columns/samples as rows or samples as columns/variables as rows.
Once I can use Access or Base, the data should be ready to go through BaNJO!