by meninonas » Thu Jul 31, 2014 12:06 am
Tharaa,
I apologize for the late response. I have also posted this on the discussion board just in case.
I have checked the dataset and I believe that the biggest issue that you're running into is that the data points in the text files aren't separated. It seems that each individual code (which the meaning of each was given in the PDF) was all placed together in a meaningless fashion.
An example, from the SEER_2005.lo_2nd_half_OTHER.TXT file, can be seen below:
00000015435010010841921 03122005C42109983199833911 9999 98880098898 98888888888771110102000204400204109890 1090 01 217 3700099999999 0140074 55980461109 01022051 380003800040 99999999919993 0017100171 99USALA8
In the first number, you’ll have several codes, i.e. 00000015435010010841921 you can divide as:
0000001543: Louisiana
5: Widowed
01: White
0: Non-Spanish/Non-Hispanic
0: Non-Spanish-Hispanic-Latino
1: Male
084: Age
1921: Year of Birth
And it'll continue like that with the other numbers.
I think that what will work best for you is to import it to excel and to create a Macros, where you can teach excel how to separate all of the values by their meanings.
If you would like, you could add me on Skype (lrami016) in order to further explain this.