The code labeled RMatchGenes.R found on the GitLab site can be used to find the genes that are common among all of the data sets that you are using. The code works on dscrt.txt, and the aftexcel.txt files produced by the previously mentioned cleaning code whose function is named THEFT(). I have not yet posted the code that also works on the Z-score files but if anyone needs it I believe that Lauren and I have a working version that either one of us can send you.
How to use RMatchGenes.R?:
- Make sure that you have installed the libraries that are necessary to run the code. I have discussed this in a previous post. If you have installed the libraries before there is no need to install them again unless it has been a while since you first installed them in which case there may have been an update for each library.
- Set your working directory in R to the folder that contains the aftexcel.txt and the dscrt.txt files. I explained how to do this in an earlier post.
- Count the amount of rows of clinical data that you have in each file that you plan to use for the code. Write this information down somewhere but make sure that you keep track of the files to which each number belongs to.
- Now copy and paste the RMatchGenes.R code into the R window and press enter.
- You will notice a statement that reads "Choose the file/files you want to analyze:" and you will see a list of files found within the directory that you previously chose. Follow the directions on the screen for printing the file numbers. Make sure to put the file numbers in ascending order.
- Then you will notice a question that reads "How many rows of clinical data are their in each data set ... ?" You must use the information that you acquired in step three. Separate each number by a comma with no spaces. For example, if you have 3 data sets and each data set has 6 rows of clinical data then you should type in 6,6,6. You must choose the files in the same order in which you typed the clinical data information for the previous question.
- No more user input is required.
- Once the program is completed you will notice that files were created with the names GSE#####matched.txt within the directory that you had specified. These files contain only the clinical data and information on the genes that each file had in common.
Good luck,
Efrain Gonzalez