Pulmonary Arterial Hypertension - PAH

Gene-gene gene-environment interactions. Hamza Assaggaf <hassa001@fiu.edu> will be the moderator of this forum.

Moderator: Hamza

Re: Pulmonary Arterial Hypertension - PAH

Postby cwyoo » Tue Jul 01, 2014 10:44 am

Hamza wrote:Lastly, the 6 hours run networks. None of the run had the same networks.
As Luis and I were discussing, 6 hour in the maximum time to run by Banjo. However, as Luis mentioned, it might be helpful if I did logistic regression in SPSS and then use the significant genes which will be less in number and run them in Banjo. What is your suggestion?

I am still working on the best setting for logistic regression in SPSS since I had some issues with it today.


Please summerize the log liklihood score for each model. What is the highest log liklihood model? Logistic regression is not suitable in understanding the interactions among many variables. However, for your class project, you should go ahead and prepare to analyze your data with logistic regression.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Pulmonary Arterial Hypertension - PAH

Postby cwyoo » Tue Jul 01, 2014 10:49 am

meninonas wrote:Dr. Yoo,

I just checked with Hamza the 6-hour run and none of the three network matched, which means that neither the 1, 2, 4, nor 6 hour run created equal BNs.

I was thinking that maybe if Hamza runs the logistic regression first and then finds which genes are associate with Hypertension, we could run Banjo with just these genes. I think that it makes more sense when it comes to the creation of the Bayesian Network and also it might help when we run Banjo with less variables.

Lastly, I was wondering about Logistic vs. Linear Regression for his dataset. We found that linear Regression works when both the dependent and independent variables are continuous. In Hamza's dataset, PAH (which is the dependent variable) is binary; hence, linear regression doesn't seem to work very well. In contrast, Logistic Regression allows for one of the variables to be categorical. Do you think we should try to use both or just logistic regression?

Websites where I found information about Linear Regression:

http://cjem-online.ca/v9/n2/p111
http://www.adasis-oz.com/tips/2013/8/28 ... regression
http://udel.edu/~mcdonald/statlogistic.html
http://stackoverflow.com/questions/1214 ... regression


Typically that is how most of the analyses are performed. However, it is better to approach it with each model's underlying assumptions. What are assumptions of linear regression and logistic regression?
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Pulmonary Arterial Hypertension - PAH

Postby Hamza » Tue Jul 01, 2014 3:53 pm

cwyoo wrote:
Hamza wrote:Lastly, the 6 hours run networks. None of the run had the same networks.
As Luis and I were discussing, 6 hour in the maximum time to run by Banjo. However, as Luis mentioned, it might be helpful if I did logistic regression in SPSS and then use the significant genes which will be less in number and run them in Banjo. What is your suggestion?

I am still working on the best setting for logistic regression in SPSS since I had some issues with it today.


Please summerize the log liklihood score for each model. What is the highest log liklihood model? Logistic regression is not suitable in understanding the interactions among many variables. However, for your class project, you should go ahead and prepare to analyze your data with logistic regression.


For 1 hour, -21076, -21024, and -21064.
For 2 hours, -21053, -21068, and -21030.
For 4 hours, -20991, -21061, and -21032.
For 6 hours, -21045, -21023, and -21035.
Hamza
 
Posts: 34
Joined: Tue Jun 24, 2014 2:47 am

Re: Pulmonary Arterial Hypertension - PAH

Postby meninonas » Tue Jul 01, 2014 5:53 pm

For Linear Regression, the assumptions are as follows,

(1) For any given value of x, the corresponding value of y has an average value α+βx, which is a linear of x

(2) For any given value of x, the corresponding value of y is normally distributed about α+βx with the same variance for any x.

(3) For any two data points (x1,y1), (x2,y2), the error terms e1, e2 are independent of each other

For Logistic Regression, the assumptions are as follows,

(1) Assumes a linear relationship between the logit of the independent variables and dependent variables. However, does not assume a liner relationship between the actual dependent and independent variables.

(2) The sample is ‘large’- reliability of estimation declines when there are only a few cases

(3) Independent variables are not linear functions of each other

(4) Normal distribution is not necessary or assumed for the dependent variable.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Pulmonary Arterial Hypertension - PAH

Postby cwyoo » Wed Jul 02, 2014 6:13 am

Hamza wrote:
cwyoo wrote:
Hamza wrote:Lastly, the 6 hours run networks. None of the run had the same networks.
As Luis and I were discussing, 6 hour in the maximum time to run by Banjo. However, as Luis mentioned, it might be helpful if I did logistic regression in SPSS and then use the significant genes which will be less in number and run them in Banjo. What is your suggestion?

I am still working on the best setting for logistic regression in SPSS since I had some issues with it today.


Please summerize the log liklihood score for each model. What is the highest log liklihood model? Logistic regression is not suitable in understanding the interactions among many variables. However, for your class project, you should go ahead and prepare to analyze your data with logistic regression.


For 1 hour, -21076, -21024, and -21064.
For 2 hours, -21053, -21068, and -21030.
For 4 hours, -20991, -21061, and -21032.
For 6 hours, -21045, -21023, and -21035.


Among these 12 structures, which one has the highest score and how much times more likely than the second best structure?
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Pulmonary Arterial Hypertension - PAH

Postby Hamza » Thu Jul 03, 2014 5:31 am

Please summerize the log liklihood score for each model. What is the highest log liklihood model? Logistic regression is not suitable in understanding the interactions among many variables. However, for your class project, you should go ahead and prepare to analyze your data with logistic regression.[/quote]

For 1 hour, -21076, -21024, and -21064.
For 2 hours, -21053, -21068, and -21030.
For 4 hours, -20991, -21061, and -21032.
For 6 hours, -21045, -21023, and -21035.[/quote]

Among these 12 structures, which one has the highest score and how much times more likely than the second best structure?[/quote]

Considering the negative sign, the first network of 4 hours run is the highest score (-20991). The second best structure is second network of 6 hours run (-21023).
Hamza
 
Posts: 34
Joined: Tue Jun 24, 2014 2:47 am

Re: Pulmonary Arterial Hypertension - PAH

Postby Hamza » Thu Jul 10, 2014 11:37 pm

I checked 4 hours network and I could not find the dependent variable which is the disease (PAH). There might be something missing when we run Banjo.

I will contact Luis to meet and try to figure out the issue and we will run Banjo again. Once I have the correct map, I will work on GenIe to produce the network, also I will find out the Markov Blanket for PAH.

For class proposal, I will update the assumption and I will post it in the forum.
For Linear regression on SPSS, I will post the results soon.
Hamza
 
Posts: 34
Joined: Tue Jun 24, 2014 2:47 am

Re: Pulmonary Arterial Hypertension - PAH

Postby Hamza » Sat Jul 12, 2014 4:00 am

I attached 3 files:
1- Class proposal.
2- Bodyfat results-Linear Regression results.
3- PAH results-Linear Regression results.

Banjo's bayesian networks are still in process.
Attachments
PAH results-Linear Regression.pdf
(546.46 KiB) Downloaded 135 times
Bodyfat results-Linear Regression.pdf
(194.62 KiB) Downloaded 142 times
Class Proposal.docx
(132.53 KiB) Downloaded 141 times
Hamza
 
Posts: 34
Joined: Tue Jun 24, 2014 2:47 am

Re: Pulmonary Arterial Hypertension - PAH

Postby meninonas » Sun Jul 13, 2014 8:51 pm

Professor,

Earlier today, I finished running Banjo for the PAH data. Unfortunately, neither of the runs (1, 2, 4, and 6 hours) produced three equal top BNs. I still strongly believe that the logistic regression needs to be run so the BN can be done with the relevant proteins. I think this will also help with the production of a consistent BN.

I have sent the results to Hamza. He should post at another time.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Pulmonary Arterial Hypertension - PAH

Postby cwyoo » Mon Jul 14, 2014 9:31 am

Hamza wrote:I attached 3 files:
1- Class proposal.
2- Bodyfat results-Linear Regression results.
3- PAH results-Linear Regression results.

Banjo's bayesian networks are still in process.


Can you post the class related materials under Manuscripts & Documentation > Class Projects > QuantII? There, please discuss about the results in bodyfat linear regression, e.g., what does the R squre suggests? Are results supporting linear relationships among the input variables and outcome? Did this dataset meet the assumptions of linear regression?, etc.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

PreviousNext

Return to Hypertension

Who is online

Users browsing this forum: No registered users and 1 guest