SMLG (Statistical Machine Learning Group) Discussion Forum

by **cwyoo** » Wed Feb 25, 2015 7:28 pm

meninonas wrote:Professor,

Please find attached the Master Data File ordered by the strength of the correlation with NRF1.

How I got this file is by doing the following:

Downloading the "Transposed Data by NRF1.csv" file

Deleted the #DIV/0! columns

Ordered the list by the absolute value of the correlations

Re-added the clinical variables

Transposed the dataset

I have compared the correlations of NRF1 and FIP1L1 between Data.csv (posted on Mon Feb 16, 2015 4:02 pm) with your current data. The Data.csv gives me 0.410924824 (0.410923691 excluding the extra one case with SAMPLE_ID called "X") and the current data gives 0.302028776. I believe this discrepancy is due to the fact that you have dropped the SAMPLE_ID from the dataset and it is hard to back track from then. Please note that Data.csv has an extra SAMPLE_ID called "X" that was not in the original data.

I suggest you keep the SAMPLE_ID when you post the dataset here. Please start from Data.csv (without the SAMPLE_ID = X) and generate a new master file with NRF1 correlated genes. From there generate 25, 26, 27, 28, 29, and 30 variables datasets and start running with bene starting from 25 variables dataset.

by **cwyoo** » Thu Feb 26, 2015 3:10 pm

meninonas wrote:Professor,

I have coded as following

Clinical Variables: If it equals to the corresponding Variable=1, Other=0

Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.

Please check your work carefully before uploading it. This is the third time that I need to ask you to revise the dataset. In the message that you initially posted on Mon Feb 16, 2015 4:02 pm (Last edited on Thu Feb 26, 2015 12:31 pm, edited 2 times in total), the Recoded Data.csv last case with SAMPLE_ID = TCGA-E2-A1BD-01 is missing, however, the original data has an entry for that SAMPLE_ID.

Also, please remove all files with the same content except the first column.

Please redo your work.

by **cwyoo** » Fri Feb 27, 2015 11:39 am

cwyoo wrote:
meninonas wrote:Professor,

I have coded as following

Clinical Variables: If it equals to the corresponding Variable=1, Other=0

Genes: Down=0, (NULL)=1,UP=2

I have added the coded data. I am currently finding the correlated variables.

Once you let me know about which clinical variables too keep/delete variables I'll start running Banjo.

Please check your work carefully before uploading it. This is the third time that I need to ask you to revise the dataset. In the message that you initially posted on Mon Feb 16, 2015 4:02 pm (Last edited on Thu Feb 26, 2015 12:31 pm, edited 2 times in total), the Recoded Data.csv last case with SAMPLE_ID = TCGA-E2-A1BD-01 is missing, however, the original data has an entry for that SAMPLE_ID.

Also, please remove all files with the same content except the first column.

Please redo your work.

Always make sure that your new files agree with the original dataet TGCA_2012_all_merged.csv that was posted on Tue Feb 10, 2015 11:59 am.

by **meninonas** » Fri Feb 27, 2015 4:26 pm

Dr. Yoo,

What do you mean by agree? Remember that TGCA_2012_all_merged.csv was changed because you required to change all of the clinical variables; plus that file is not coded.

by **cwyoo** » Fri Feb 27, 2015 4:42 pm

meninonas wrote:Dr. Yoo,

What do you mean by agree? Remember that TGCA_2012_all_merged.csv was changed because you required to change all of the clinical variables; plus that file is not coded.

Agree in terms of number of cases, correct coding, and correlation of variables.

by **meninonas** » Fri Feb 27, 2015 8:51 pm

Dr. Yoo,

The results have been uploaded. Please find attached the Master File

by **meninonas** » Mon Mar 02, 2015 2:42 pm

Dr. Yoo,

I calculated it and it gave me an r squared of 0.008307222.

I got the t-value from the Excel function TINV, that with df=n - 2 and alpha=0.05, give me a t-value of 1.965123216.

This means that 1896 genes have a significant correlation.

by **cwyoo** » Tue Mar 03, 2015 8:14 pm

meninonas wrote:Dr. Yoo,

I calculated it and it gave me an r squared of 0.008307222.

I got the t-value from the Excel function TINV, that with df=n - 2 and alpha=0.05, give me a t-value of 1.965123216.

This means that 1896 genes have a significant correlation.

Note that r is the Pearson correlation not r^2. Please double check the attached calculation. |r| >= 0.091910682
then p < 0.05.

by **meninonas** » Wed Mar 04, 2015 6:26 pm

Dr. Yoo,

According to your calculations, there are 1218 significant genes. I will set up banjo to run the analysis.

by **meninonas** » Thu Mar 05, 2015 1:19 pm

Dr. Yoo,

Please find attached the results fro the Banjo Analysis that I did yesterday. It ran for 18 hours.

These are the logs from the results:

When I ran the Log Normalization score on the three scores above, I got the following:

Code: Select all: Total score -332928.6493 Log score -333092.3783 is 7.82345830726534e-70 % of total score Total score -332928.6493 Log score -333004.1935 is 1.554428678507e-31 % of total score Total score -332928.6493 Log score -332928.6493 is 100 % of total score

SMLG (Statistical Machine Learning Group) Discussion Forum

Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Re: Datasets

Who is online