SMLG (Statistical Machine Learning Group) Discussion Forum

by **lsand039** » Thu Feb 25, 2016 1:53 pm

Here's a table for GDS4136. The values on this file according to the GEO Accession viewer are MAS5-calculated Signal intensity.

This is the last of my files that has a table neatly organizing the gene to each individual sample. I'll need to reorganize the data from the downloaded *.family.soft files and match the gene probe ID to the gene to keep a consistent format.

by **lsand039** » Thu Mar 03, 2016 1:38 pm

Attached is a file matches the different gene probe IDs to the genes.

by **lsand039** » Thu Mar 03, 2016 2:10 pm

Here's a table for GSE63060. All the demographic data is there; the probe names just need to matched with the genes.

by **cwyoo** » Thu Mar 03, 2016 3:53 pm

lsand039 wrote:Attached is a file matches the different gene probe IDs to the genes.

Could you post the source of this data and describe what you did if you processed the original data?

by **shstyoo** » Fri Mar 04, 2016 8:13 pm

I'll be working on a script that will combine the probe id with the gene id. It should be done by Monday when I return to Gainesville.

If you would like to look at my parseFile script you can find it here: https://github.com/shstyoo/alzheimer-prediction-model

The script will be uploaded to the this specific repository, if you have any questions about cloning the repo or how to use Git let me know.

by **shstyoo** » Mon Mar 07, 2016 4:18 am

Just a quick update, my computer is having trouble opening the actual GSE6360_family.soft file (found on the GEO website). It looks like the file size is too large for Open Office to handle. Are you able to open it through Microsoft Excel?

by **cwyoo** » Mon Mar 07, 2016 10:27 am

shstyoo wrote:Just a quick update, my computer is having trouble opening the actual GSE6360_family.soft file (found on the GEO website). It looks like the file size is too large for Open Office to handle. Are you able to open it through Microsoft Excel?

Steve, could you create a script that reads in two text files (which are in comma separated (csv) or tap separtate (txt) format) and creates one text file with same format? You may use the two files that Lauren posted here (Probe IDs and Gene Names.csv is already in comma separated format; and GSE63060.xlsx should be converted into a comma separated (csv) or a tap separtate (txt) format).

So, your script should read in Probe IDs and Gene Names.csv and GSE63060.xls (converted into a comma separated (csv) or a tap separtate (txt) format; let's call it GSE63060.csv) and produce a result text file that adds a column called GeneID (that corresponds to the Probe ID from Probe IDs and Gene Names.cs) into GSE63060.csv.

Since Lauren has more files like GSE63060.xlsx, she is planning to use your script and produce the text files that are needed to do further analyses. Please let us know if you have any other questions/comments.

by **cwyoo** » Mon Mar 07, 2016 10:29 am

shstyoo wrote:I'll be working on a script that will combine the probe id with the gene id. It should be done by Monday when I return to Gainesville.

If you would like to look at my parseFile script you can find it here: https://github.com/shstyoo/alzheimer-prediction-model

The script will be uploaded to the this specific repository, if you have any questions about cloning the repo or how to use Git let me know.

Please push your script in a public domain project into SMLG Git server.

by **lsand039** » Mon Mar 07, 2016 1:32 pm

cwyoo wrote:
lsand039 wrote:Attached is a file matches the different gene probe IDs to the genes.

Could you post the source of this data and describe what you did if you processed the original data?

For probe IDs beginning with ILMN, I just copied and pasted the probe IDs with their gene names from this site: http://www.genomequebec.mcgill.ca/compg ... robes.html
For the other probe ID names and their associated genes, I took the information from GDS810, GDS4136, and GDS2795 which already had information linking the probe IDS to the corresponding gene. Some files contained probe IDs that weren't on the other files, but most of the probe IDs were the same and I deleted the duplicates.

I combined both sets of information all in one file, Probe IDs and Gene Names.csv.

by **lsand039** » Mon Mar 07, 2016 1:43 pm

cwyoo wrote:
shstyoo wrote:Just a quick update, my computer is having trouble opening the actual GSE6360_family.soft file (found on the GEO website). It looks like the file size is too large for Open Office to handle. Are you able to open it through Microsoft Excel?

Steve, could you create a script that reads in two text files (which are in comma separated (csv) or tap separtate (txt) format) and creates one text file with same format? You may use the two files that Lauren posted here (Probe IDs and Gene Names.csv is already in comma separated format; and GSE63060.xlsx should be converted into a comma separated (csv) or a tap separtate (txt) format).

So, your script should read in Probe IDs and Gene Names.csv and GSE63060.xls (converted into a comma separated (csv) or a tap separtate (txt) format; let's call it GSE63060.csv) and produce a result text file that adds a column called GeneID (that corresponds to the Probe ID from Probe IDs and Gene Names.cs) into GSE63060.csv.

Since Lauren has more files like GSE63060.xlsx, she is planning to use your script and produce the text files that are needed to do further analyses. Please let us know if you have any other questions/comments.

I wasn't able to open the actual GSE6360_family.soft file, but I did find another file, GSE63060_series_matrix.txt, that contained the probe ID and array information. This file was small enough to let me open on Excel, and I used this file to create GSE63060.xlsx. The file was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/a ... c=GSE63060.

Would you prefer I upload files like GSE63060.xlsx as *.csv files? Also, it would be great if you could let me know how to use Git. Thanks!

SMLG (Statistical Machine Learning Group) Discussion Forum

GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Re: GEO datasets

Who is online