Here I will be discussing the new R code that I have been working on http://smlg.fiu.edu/gitlab/efraingonzalez0/cleaning-and-fixing-data-with-r/tree/master. I will also be answering any questions and solving the problems that you may be running into when using this new code. Please use the most recent version of the code which is currently RCleanDscret.R. I recommend checking the gitlab site frequently for new updates for this version of the code. There are still many problems that I will be tackling.
The following are some of the ones that I have noticed:
- The code currently does not allow the user to set a vector of data sets to clean. Meaning that the user is required to manually search for the filenames for each data set that they want to analyze. This is an issue that I have been tackling and I am currently testing the automated version of the code.
- Due to the fact that the code is being automated I am in the process of improving the way in which it handles errors.
- The code currently only works for data sets that are in the series matrix format and are associated with a GPL that has been downloaded using the Download full table option. I believe I have found the solution to this problem but I am still testing this solution. The solution is available in the current version of the code.
- The handling of NA's should be improved so that we do not see as many warnings from R. Although the warnings in R can be easily ignored it would be nice not to have any.