GEO Microarray Data Cleaning
Posted: Tue Feb 20, 2018 3:41 pm
To clean the GEO Dataset GSE84010, I did the following:
1) Go to https://www.ncbi.nlm.nih.gov/geo/ and search for the dataset GSE84010.
2) Download the Series Matrix File(s) (TXT Format).
3) Click on its corresponding platform GPL22111.
4) Click "View full table..." and copy & paste text into Notepad, saving it as a TXT file.
5) Open R Studio and use Efrain's code posted in GitLab to clean the dataset.
a. The most recently updated version of the code RCleanDscret.R as of 20Feb18 is attached.
b. Be sure to install the necessary libraries before running the code (pryr, MASS, dplyr, tidyr, readr, stringr).
c. Copy & paste the code into R Studio & run.
d. Select the downloaded Series Matrix File(s).
e. Select the relevant platform file.
f. Check for errors.
6) When the code is finished running, it should produce 3 files in your working directory which are attached.
a. GSE84010aftexcel - Clean dataset
b. GSE84010zscore - Normalized dataset
c. GSE84010dscrt - Discretized dataset
1) Go to https://www.ncbi.nlm.nih.gov/geo/ and search for the dataset GSE84010.
2) Download the Series Matrix File(s) (TXT Format).
3) Click on its corresponding platform GPL22111.
4) Click "View full table..." and copy & paste text into Notepad, saving it as a TXT file.
5) Open R Studio and use Efrain's code posted in GitLab to clean the dataset.
a. The most recently updated version of the code RCleanDscret.R as of 20Feb18 is attached.
b. Be sure to install the necessary libraries before running the code (pryr, MASS, dplyr, tidyr, readr, stringr).
c. Copy & paste the code into R Studio & run.
d. Select the downloaded Series Matrix File(s).
e. Select the relevant platform file.
f. Check for errors.
6) When the code is finished running, it should produce 3 files in your working directory which are attached.
a. GSE84010aftexcel - Clean dataset
b. GSE84010zscore - Normalized dataset
c. GSE84010dscrt - Discretized dataset