Datasets

Re: Datasets

Postby cwyoo » Wed Feb 25, 2015 7:28 pm

meninonas wrote:Professor,

Please find attached the Master Data File ordered by the strength of the correlation with NRF1.

How I got this file is by doing the following:

    Downloading the "Transposed Data by NRF1.csv" file

    Deleted the #DIV/0! columns

    Ordered the list by the absolute value of the correlations

    Re-added the clinical variables

    Transposed the dataset


I have compared the correlations of NRF1 and FIP1L1 between Data.csv (posted on Mon Feb 16, 2015 4:02 pm) with your current data. The Data.csv gives me 0.410924824 (0.410923691 excluding the extra one case with SAMPLE_ID called "X") and the current data gives 0.302028776. I believe this discrepancy is due to the fact that you have dropped the SAMPLE_ID from the dataset and it is hard to back track from then. Please note that Data.csv has an extra SAMPLE_ID called "X" that was not in the original data.

I suggest you keep the SAMPLE_ID when you post the dataset here. Please start from Data.csv (without the SAMPLE_ID = X) and generate a new master file with NRF1 correlated genes. From there generate 25, 26, 27, 28, 29, and 30 variables datasets and start running with bene starting from 25 variables dataset.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby cwyoo » Thu Feb 26, 2015 3:10 pm

meninonas wrote:Professor,

I have coded as following

    Clinical Variables: If it equals to the corresponding Variable=1, Other=0

    Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.


Please check your work carefully before uploading it. This is the third time that I need to ask you to revise the dataset. In the message that you initially posted on Mon Feb 16, 2015 4:02 pm (Last edited on Thu Feb 26, 2015 12:31 pm, edited 2 times in total), the Recoded Data.csv last case with SAMPLE_ID = TCGA-E2-A1BD-01 is missing, however, the original data has an entry for that SAMPLE_ID.

Also, please remove all files with the same content except the first column.

Please redo your work.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby cwyoo » Fri Feb 27, 2015 11:39 am

cwyoo wrote:
meninonas wrote:Professor,

I have coded as following

    Clinical Variables: If it equals to the corresponding Variable=1, Other=0

    Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.


Please check your work carefully before uploading it. This is the third time that I need to ask you to revise the dataset. In the message that you initially posted on Mon Feb 16, 2015 4:02 pm (Last edited on Thu Feb 26, 2015 12:31 pm, edited 2 times in total), the Recoded Data.csv last case with SAMPLE_ID = TCGA-E2-A1BD-01 is missing, however, the original data has an entry for that SAMPLE_ID.

Also, please remove all files with the same content except the first column.

Please redo your work.


Always make sure that your new files agree with the original dataet TGCA_2012_all_merged.csv that was posted on Tue Feb 10, 2015 11:59 am.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Fri Feb 27, 2015 4:26 pm

Dr. Yoo,

What do you mean by agree? Remember that TGCA_2012_all_merged.csv was changed because you required to change all of the clinical variables; plus that file is not coded.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Fri Feb 27, 2015 4:42 pm

meninonas wrote:Dr. Yoo,

What do you mean by agree? Remember that TGCA_2012_all_merged.csv was changed because you required to change all of the clinical variables; plus that file is not coded.


Agree in terms of number of cases, correct coding, and correlation of variables.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Fri Feb 27, 2015 8:51 pm

Dr. Yoo,

The results have been uploaded. Please find attached the Master File
Attachments
Master File.csv
(1.91 MiB) Downloaded 153 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby meninonas » Mon Mar 02, 2015 2:42 pm

Dr. Yoo,

I calculated it and it gave me an r squared of 0.008307222.

I got the t-value from the Excel function TINV, that with df=n - 2 and alpha=0.05, give me a t-value of 1.965123216.

This means that 1896 genes have a significant correlation.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Tue Mar 03, 2015 8:14 pm

meninonas wrote:Dr. Yoo,

I calculated it and it gave me an r squared of 0.008307222.

I got the t-value from the Excel function TINV, that with df=n - 2 and alpha=0.05, give me a t-value of 1.965123216.

This means that 1896 genes have a significant correlation.


Note that r is the Pearson correlation not r^2. Please double check the attached calculation. |r| >= 0.091910682
then p < 0.05.
Attachments
corr-significance.pdf
(217.51 KiB) Downloaded 141 times
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Wed Mar 04, 2015 6:26 pm

Dr. Yoo,

According to your calculations, there are 1218 significant genes. I will set up banjo to run the analysis.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby meninonas » Thu Mar 05, 2015 1:19 pm

Dr. Yoo,

Please find attached the results fro the Banjo Analysis that I did yesterday. It ran for 18 hours.

These are the logs from the results:

    -333092.3783

    -333004.1935

    -332928.6493

When I ran the Log Normalization score on the three scores above, I got the following:

Code: Select all
Total score -332928.6493
Log score -333092.3783 is 7.82345830726534e-70 % of total score
Total score -332928.6493
Log score -333004.1935 is 1.554428678507e-31 % of total score
Total score -332928.6493
Log score -332928.6493 is 100 % of total score
Attachments
Banjo Results.zip
(6.83 MiB) Downloaded 150 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

PreviousNext

Return to Breast Cancer

Who is online

Users browsing this forum: No registered users and 1 guest