Datasets

Re: Datasets

Postby meninonas » Fri Feb 13, 2015 5:59 pm

Dr. Yoo,

Find the merged dataset below. It has dimensions of 463x2140. I'll run the correlations on Monday and run Banjo.
Attachments
Data.csv
(3.55 MiB) Downloaded 134 times
Last edited by meninonas on Thu Feb 26, 2015 12:30 pm, edited 1 time in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Sat Feb 14, 2015 7:50 am

meninonas wrote:Dr. Yoo,

I have coded the datasets according to Mutation, Amplification, and Expression. I was not able to merge them in one large dataset due to LibreOffice Calc not being able to open the original dataset in its entirety. LibreOffice Calc gives me the following error:

Code: Select all
The data could not be loaded completely because the maximum number of columns per sheet was exceeded.


I would suggest opening the datasets in Microsoft Office if you have it available to you.

I am currently working on labeling the genes according to _MUT, _AMP, and _EXP. Those should be ready tomorrow.


Please use the following reference coding for expression:
0: Down
1: (NULL)
2: UP

Also create the following five binary variables: Luminal_A, Luminal_B, Basal_like, HER2_enriched, Normal_like

Encode each case of the above variable 1 if PAM50_Subtype case is the same as the variable name; otherwise encode it as 0.

Select 900 gene expressions that are correlated with either PAM50_Subtype or NRF1. Analyze with banjo using 906 variables (900 gene expressions, NRF1 gene expression, Luminal_A, Luminal_B, Basal_like, HER2_enriched, Normal_like).

I have put Office 2007 ISO image file in /cdrom so mount it in your Windows virtual machine and use the key (in /cdrom as well) to install the program.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Mon Feb 16, 2015 3:12 pm

Professor,

In an email, you asked me to include the following variables:

    Diagnosis Age
    ER Status
    HER2 Final Status
    Overall Survival Status
    Overall Survival (Months)
    PAM50 Subtype

In your last post, you don't mention ER Status, HER2 Final Status, Overall Survival Status, or Overall Survival (Months). Do you want me to delete those and separate PAM50 Subtype? Or do you want me to keep those?

I added the dataset you sent me for reference.
Attachments
Breast Invasive Carcinoma_TCGA Nature 2012_468 meged cases.xlsx
(3.4 MiB) Downloaded 129 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby meninonas » Mon Feb 16, 2015 4:02 pm

Professor,

I have coded as following

    Clinical Variables: If it equals to the corresponding Variable=1, Other=0

    Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.
Attachments
Data.csv
(3.55 MiB) Downloaded 129 times
Recoded Data.csv
(1.91 MiB) Downloaded 138 times
Last edited by meninonas on Thu Feb 26, 2015 12:31 pm, edited 2 times in total.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Mon Feb 16, 2015 6:49 pm

meninonas wrote:Professor,

I have coded as following

    Diagnosis Age: >50=1, <50=0

    ER Status: Negative=0, Positive=1, Otherwise=2

    HER2 Final Status: Negative=0, Positive=1, Otherwise=2

    Overall Survival Status: DECEASED=0, LIVING=1

    Overall Survival Months: <Average=0, >Average=1

    Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.


As I stated in the previous message and told you today, please use the following variables in banjo:

Select 900 gene expressions that are correlated with either PAM50_Subtype or NRF1. Analyze with banjo using 907 variables (900 gene expressions, NRF1 gene expression, Age, Luminal_A, Luminal_B, Basal_like, HER2_enriched, Normal_like).
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby cwyoo » Wed Feb 18, 2015 6:03 pm

meninonas wrote:Dr. Yoo,

Please find the reduced dataset below. I'm already working on the banjo analysis.


Please post intermediate files especially the ones with correlation calculations.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby cwyoo » Wed Feb 18, 2015 8:14 pm

meninonas wrote:Dr. Yoo,

Please find the datasets below with the correlations. I did the following:

    I took the absolute value of the correlations and ordered the dataset according to the correlations

    Afterwards, I chose the top 220 variables with the highest correlations from each individual dataset and placed them within one large dataset

    Finally, I deleted all of the variables that were repeated and ended with 907 variables.

I already ran banjo for one hour and have sent you the results. I already started running for two hours. I'll send you the results tomorrow morning.


Most of the csv files contain all the same entries in rows. I ask you to delete those files and upload the files with correlations information.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Thu Feb 19, 2015 10:49 am

Dr. Yoo,

Yes. The correlations are at the end of the file. I did it the same way you showed me.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Thu Feb 19, 2015 11:52 am

meninonas wrote:Dr. Yoo,

Yes. The correlations are at the end of the file. I did it the same way you showed me.


Luis, if the first column contains all same name, how would we know what correlation you are talking about? Also, would you want to see a dataset that has all the same entries except the last column? Please think through before you post a dataset.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby cwyoo » Fri Feb 20, 2015 12:20 pm

meninonas wrote:Professor,

Find the correlated data below. I have also sent you the 2-Hour Banjo results. I am currently running the 4-Hour Banjo Analysis.


The original data has 463 cases, but the data that you attached here has 464 cases. Why is this the case? Also post the sorted genes by correlation value per variable.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

PreviousNext

Return to Breast Cancer

Who is online

Users browsing this forum: No registered users and 1 guest

cron