NRF1 and NRF1 target genes and aggressive Breast Cancer

NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Fri Sep 25, 2015 12:01 am

Hypothesis: Aberrant genetic and Epigenetic regulation of NRF1 and its targets genes contributes to the pathogenesis of aggressive breast cancer via perturbation of diverse mitochondrial and extra-mitochondrial functions.

Methods: Expression levels of 2,470 identified NRF1 target genes in a set of approximately 460 patients / tumors will be analyze to determine association with the development of aggressive breast cancer. Data set from CBioPortal will be used, after crossing the results with the list of 2,500 NRF1 target genes to select only the data for these genes. I am uploading the initial data but this will be subject to revision as we review the data for inclusion criteria and missing values.
Attachments
Gene Expression of NRF1 Target Genes rev 1.xlsx
It will be revised as we clean the data.
(13.77 MiB) Downloaded 211 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Mon Nov 02, 2015 5:11 pm

We cleaned the dataset for missing or incomplete observations. Total number of variables is 2,028 from which 4 are clinical variables: Age, Triple negative status of breast tumor (TNBC), primary tumor size (TUMOR) and Metastasis . Expression of the 2,024 genes has been categorized 0, 1,2 based on Z values (Z<-1=0; -1<=Z<1 = 1; z=>1=2)
Attachments
Breastcancer2028var012rankedperparsoncorrelationfewclinvarhorizontal.xlsx
Breast Cancer
(2.89 MiB) Downloaded 209 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Mon Nov 02, 2015 11:53 pm

Progress update:
1.- We calculated the Pearson correlation of all NRF1 target genes against NRF1 and we ranked them based on the results to check which genes were more correlated with NRF1. See attached file
2.-We compared the top 555 NRF1 target genes ranked according to the Pearson Correlation number we calculated with the top 555 NRF1 target genes found by Satoh et al. (2013). Only 119 were common (overlapped). This represents 21%. Satoh used human neuroblastoma cells to identify NRF1 target genes. This results confirm that the transcription factor process is dynamic and it depends upon several factors, being the tissue one of them; in this case, neuroblastoma VS Breast cancer.
3.-categorization of the 556 genes included in the model was changed from 0-1-2-3 to 0-1-2 ( -1>Z then var.=0; if -1=<Z<1 then var.=1; if Z>=1; then var.=2) to make the model simpler.
4.- We tried to run Banjo with 700 variables, given the new categorization, but again memory error was the initial message. We will try to use a computer with larger memory capacity next week.
5.- We calculated the variance of all genes variables to check for values equal to (or close) to Zero, to detect variables that do not provide additional information to the model but we found none,. Values ranged from 0.762 to 4.92. We also constructed a graph with the Variance distribution. See attached file
Attachments
Varianceof2024genesranked.xlsx
(150.05 KiB) Downloaded 209 times
Geneexpnew463rev1Acorrelationsranked.xlsx
(53.19 MiB) Downloaded 215 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby cwyoo » Tue Nov 03, 2015 7:26 am

jramo033 wrote:Progress update:
1.- We calculated the Pearson correlation of all NRF1 target genes against NRF1 and we ranked them based on the results to check which genes were more correlated with NRF1. See attached file
2.-We compared the top 555 NRF1 target genes ranked according to the Pearson Correlation number we calculated with the top 555 NRF1 target genes found by Satoh et al. (2013). Only 119 were common (overlapped). This represents 21%. Satoh used human neuroblastoma cells to identify NRF1 target genes. This results confirm that the transcription factor process is dynamic and it depends upon several factors, being the tissue one of them; in this case, neuroblastoma VS Breast cancer.
3.-categorization of the 556 genes included in the model was changed from 0-1-2-3 to 0-1-2 ( -1>Z then var.=0; if -1=<Z<1 then var.=1; if Z>=1; then var.=2) to make the model simpler.
4.- We tried to run Banjo with 700 variables, given the new categorization, but again memory error was the initial message. We will try to use a computer with larger memory capacity next week.
5.- We calculated the variance of all genes variables to check for values equal to (or close) to Zero, to detect variables that do not provide additional information to the model but we found none,. Values ranged from 0.762 to 4.92. We also constructed a graph with the Variance distribution. See attached file


Why do you catagorize only 556 genes? Catagorize all the genes and then calculate the correlation with NRF1.

How did you calculate the variance, e.g., how did you get 1.03323 for NRF1? You should also calculate variances (including the clinical variables) before and after the discritization.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Thu Nov 05, 2015 12:23 pm

Dr. Yoo

I categorized all the genes; I just put 556 because those are the ones included in the model we areable to work so far.
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Thu Nov 12, 2015 12:39 am

Progress update:
1.- We calculated the variance after categorization of all the variables, including the clinical variables. Results range from 0.03115 to 0.53. See attached file, including a graph of the variance distribution. There are two variables with variances lower than 0.1 which are "NCAN" ( NRF1 target gene) and METAST (Metastasis status) which we could think about removing from the model.

2.- Since we had calculated the Pearson correlation for all the genes using Z values, we calculated again the Pearson correlation using the categorized variables. We ranked again the NRF1 target genes based on these new results. We also made another adjustment to the initial ranking (based on Z values correlations) what is to make the ranking based on absolute values. I think that maybe we had made the mistake of ranking based on actual values leaving out of the model NRF1 target genes with high correlation but negative.

3.- We run the new 560 variables model in Banjo using Path one computer for 1 hour and 3 hours. a better model (lower BDe) was achieved with 3 hours. See results and graph files for 1 hour and 3 hours.

4.- We downloaded Banjo in Path 4 computer which has a larger memory in order to try running Banjo with a larger number of variables.
Attachments
Results560var3hours.txt
(249.73 KiB) Downloaded 216 times
Results560var1hour.txt
(125.44 KiB) Downloaded 211 times
Graph560var3hours.jpg
Graph560var3hours.jpg (1.99 MiB) Viewed 5187 times
Graph560var1hour.jpg
Graph560var1hour.jpg (1.78 MiB) Viewed 5187 times
Breastcancer2028Variancecat012.xlsx
(3.06 MiB) Downloaded 203 times
Correlations.xlsx
(37.24 MiB) Downloaded 196 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Wed Nov 18, 2015 12:15 pm

Progress update:
I tried to run Banjo in the new server Path 4 here the results:

1.- I tried to run Banjo with all the 2,028 variables and I got the message "Banjo has run out of available memory"
2.- I got the same message with 1,500 and 1,000 variables.
3.- Finally I was able to run the program with 800 variables and the same number of observations we have been working with (460 observations). The first run was with one hour. See attached file with results and graph.
4.- I also run for 3 hours. I will go today to FIU to get the results for 3 hours.
5.- I will try this week to run 6 hours and also to color code the nodes "NRF1" and "TNBC" (triple negative breast cancer status" to facilitate the identification of this two variables of interest. I will also compare the structure of 1 , 3 and 6 hours to check if ther are equivalent, especially the substructure of NRF1 and TNBC nodes.
Attachments
Graph800vars1hour.jpg
Graph800vars1hour.jpg (2.62 MiB) Viewed 5173 times
Results800vars1hour.txt
(156.74 KiB) Downloaded 197 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby cwyoo » Wed Nov 18, 2015 1:50 pm

jramo033 wrote:Progress update:
I tried to run Banjo in the new server Path 4 here the results:

1.- I tried to run Banjo with all the 2,028 variables and I got the message "Banjo has run out of available memory"
2.- I got the same message with 1,500 and 1,000 variables.
3.- Finally I was able to run the program with 800 variables and the same number of observations we have been working with (460 observations). The first run was with one hour. See attached file with results and graph.
4.- I also run for 3 hours. I will go today to FIU to get the results for 3 hours.
5.- I will try this week to run 6 hours and also to color code the nodes "NRF1" and "TNBC" (triple negative breast cancer status" to facilitate the identification of this two variables of interest. I will also compare the structure of 1 , 3 and 6 hours to check if ther are equivalent, especially the substructure of NRF1 and TNBC nodes.


When you are looking at the sub structure, first look for Markov Blanket of a variable of interest. Markov Blanket of a random variable X, denote as MB(X), are set of variables that includes the parents of X, the children of X, and the parents of the children of X. You can expand MB(X) with MB(MB(X)), MB(MB(MB(X))), etc.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Mon Nov 23, 2015 1:36 am

Progress update:

1.-I was able to run 800 variables in Banjo using Path 4 server for 3 hour and 6 hours.
2. I was able also to run the DOT file to color code the nodes of the four clinical variables (Age, TNBC, Tumor and Metastasis) and NRF1 gene expression. Color coded nodes for these five variables make easier to identify them. See Attached files.
3.- The best BDe score was reached withy 3 hours: 1hour=-299049; 3 hour=-298643.45 and 6hour=-2998824
4.- Substructures for NRF1 and TNBC look different.
5. Next steps to be discussed with Dr. Yoo
Attachments
Graph800var6hour.jpg
Graph800var6hour.jpg (2.77 MiB) Viewed 5166 times
Graph800var3hour.jpg
Graph800var3hour.jpg (3.07 MiB) Viewed 5166 times
Graph800var1hour.jpg
Graph800var1hour.jpg (2.62 MiB) Viewed 5166 times
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Re: NRF1 and NRF1 target genes and aggressive Breast Cancer

Postby jramo033 » Mon Nov 30, 2015 10:22 pm

Progress update:
After running Banjo 1hour, 2 hours and 3 hours and 6 hours we found different substructures for the variables of interest (NRF1 and TNBC). The best BDe score so far was for the 3 hours run. This week we have been running again Banjo for the same times 1,2,3 and 6 hours.
The idea is to choose the best of the 8 models.
We also have been trying to normalize the BDe scores using Perl but we have not been successful so far. I have been following the instructions posted in SMLG but the script log_norm.pl is not working for us. We will keep trying.
jramo033
 
Posts: 38
Joined: Thu Sep 03, 2015 11:21 pm

Next

Return to Breast Cancer

Who is online

Users browsing this forum: No registered users and 2 guests