SMLG (Statistical Machine Learning Group) Discussion Forum

by **cwyoo** » Mon Jul 07, 2014 1:12 pm

The following zip file includes Surveillance, Epidemiology, and End Results (SEER) 1973-2011 for other tumors. Please read seerdic.pdf to parse the dataset.

Download the dataset from here: http://131.94.9.17/data/SEER/SEER_1973_2011_OTHER.zip

by **Tharaa** » Thu Jul 17, 2014 3:49 pm

Hello all,
I've been working hard to understand and decode this data. If anyone know how to understand it or have an idea of decoding it please inform me. I've opened it on excel and spss, and still don't get it!! I also tried to get SEER Ab software, but needed a registation authority. Please help.

by **meninonas** » Thu Jul 17, 2014 4:38 pm

Tharaa,

I'm going to be in school tomorrow if you want to get together to do this. Let me know.

by **cwyoo** » Thu Jul 17, 2014 5:48 pm

Tharaa wrote:Hello all,
I've been working hard to understand and decode this data. If anyone know how to understand it or have an idea of decoding it please inform me. I've opened it on excel and spss, and still don't get it!! I also tried to get SEER Ab software, but needed a registation authority. Please help.

Have you tried to open it using Open command from Excel menu bar? You need to use the Excel text import wizard. See:

http://office.microsoft.com/en-us/excel ... 02244.aspx

by **meninonas** » Thu Jul 31, 2014 12:06 am

Tharaa,

I apologize for the late response. I have also posted this on the discussion board just in case.

I have checked the dataset and I believe that the biggest issue that you're running into is that the data points in the text files aren't separated. It seems that each individual code (which the meaning of each was given in the PDF) was all placed together in a meaningless fashion.

An example, from the SEER_2005.lo_2nd_half_OTHER.TXT file, can be seen below:

00000015435010010841921 03122005C42109983199833911 9999 98880098898 98888888888771110102000204400204109890 1090 01 217 3700099999999 0140074 55980461109 01022051 380003800040 99999999919993 0017100171 99USALA8

In the first number, you’ll have several codes, i.e. 00000015435010010841921 you can divide as:

0000001543: Louisiana

5: Widowed

01: White

0: Non-Spanish/Non-Hispanic

0: Non-Spanish-Hispanic-Latino

1: Male

084: Age

1921: Year of Birth

And it'll continue like that with the other numbers.

I think that what will work best for you is to import it to excel and to create a Macros, where you can teach excel how to separate all of the values by their meanings.

If you would like, you could add me on Skype (lrami016) in order to further explain this.

SMLG (Statistical Machine Learning Group) Discussion Forum

SEER Other Tumor

SEER Other Tumor

Re: SEER Other Tumor

Re: SEER Other Tumor

Re: SEER Other Tumor

Re: SEER Other Tumor

Who is online