Microarray Analysis GEO Alzheimer's Dataset

Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Thu Aug 15, 2019 4:00 pm

I'm posting in this forum the methodology for my microarray analysis of GSE45596- Brain Angiogenic Vessels in Alzheimer's Disease. The hope is that the reader can replicate my steps in an easy to understand manner to improve analysis of microarray datasets. This dataset represents a dye-swap design conducted upon brain vessels derived from Alzheimer's Disease patients and normal control patients. Therefore, the odd number samples below represent the control microarray samples and the even number samples represent the dye-swapped (cy3 (Green)/ cy5(red)) case samples. In my first post, I'm sharing the raw files for the following 20 samples:
Probes values are normalized log10 ratio Cy5/Cy3
Sample
GSM1110303
GSM1110305
GSM1110307
GSM1110309
GSM1110311
GSM1110313
GSM1110315
GSM1110317
GSM1110319
GSM1110321[/b]


The dye swap was done as a duplicate for each array. The even-numbered datasets are "paired arrays" where cDNA was labeled with Cy3 for the AD sample and Cy5 for the Normal. Meaning that the cDNA used in GSM303 was used again in a duplicate array as GSM304.


Cy3 AD and Cy5 Normal
Probes values are normalized log10 ratio Cy3/Cy5
Sample
GSM1110304
GSM1110306
GSM1110308
GSM1110310
GSM1110312
GSM1110314
GSM1110316
GSM1110318
GSM1110320
GSM1110322

Using this design in the file attached titled "Comparison" you can view the raw signal intensity values per each color in each certain sample and my calculation for the log normalized expression ratio. When compared with the attached published GEO supplementary file for the 417 significant DEG genes above or below 2.0 Fold change, the values I calculated were replicated for each sample. In the fold change column in the comparison file, you can see my FC calculation average(cases)/average(controls). My next post will have a table of the gold standard comparison between the Genespring FC calculated for the 2865 sig. genes found by the researchers of the study (using asymptomatic T-test) versus my own calculations for FC done in excel.
Attachments
published fold change analysis.xlsx
(117.17 KiB) Downloaded 114 times
Comparison.txt
(22.21 MiB) Downloaded 115 times
Comparison.xlsx
(56.73 MiB) Downloaded 116 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Fri Aug 16, 2019 3:42 pm

Following the methods discussed in my previous post, I created a table examining the difference between the Published FC values and my calculated FC values to see if the 417 significant genes above or below the 2.0 FC threshold established in the paper was replicated within standard error. The results show that 237 genes were upregulated in AD as compared to control brain microvessels with maximum upregulated FC at 4.3 and with the most downregulated fold change at -3.9 or 0.109575715 (CASE/CONTROL). Following this step, I matched the above 4000 NRF1 target gene, validated from human chip-seq assay data, with the paper's 2865 Significant DEG matching a total of 491 genes targeted by NRF1. Calculation of the averaged Log10 expression ratio values for all probes across the 20 samples was then conducted, along with the calculation of both the Z-scores and discretized values for all 491 NRF1 target genes. The z-score was calculated by z-score = Gx - Average(Gi)/ Standard Deviation(Gi), Gx represents the expression values of a specific gene of interest while average Gi represents the gene's average expression across all samples and the same condition goes for standard deviation. Discretization was calculated as follows: z-score (Gi)< -1 (D.value= 0),
-1< z-score (Gi) < 1(D.value = 1),1 < z-score (Gi) ( D.value = 2). Attached is an excel table with the 491 averaged NRF1 targets, Z-Scores, and discretized values calculated with the aforementioned methods. My next plan is to run the discretized targets in BANJO along with variables for gender and Alzheimer's disease. I'll report the results on Monday following three trial runs of 1 hour, 3 hours, and 9 hours.
Attachments
Agilent Comparison.xlsx
FC Comparison Values
(211.35 KiB) Downloaded 119 times
NRF1 Targets.txt
BANJO INPUT
(22.32 KiB) Downloaded 113 times
GSE45596.xlsx
Z-Score, LOG Expression Values, and Discretization
(12.26 MiB) Downloaded 115 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Sat Sep 14, 2019 6:07 pm

Here is the updated tables for my significant genes according to a pearson correlation comparison between NRF1 expression and the 2865 significant genes found in the study. Furthermore, I ran a correlation for AD disease and all my significant genes found. Next, I conducted a Pearson correlation on the coefficient values found for NRF1 correlation and AD disease correlation across all 2865 genes. Finally, I included only the TOP 1% AND BOTTOM 1% of genes according to the combined correlation text as seen in the correlation file below.

Next I inputted three added variables into BANJO according to their clinical significance. Age was discretized by the following criteria: 0 = Sample Age < 70 years old, 1 = Sample Age 70 to 80 years old, 2 = Sample Age > 80 years of age. For Gender, Males were 0 and Females 1. For Condition, AD vessels = 1 and normal vessels = 0. Attached in the NRF1100.txt file you can see these variables for yourself, as mentioned in a previous post I discretized according to Z-score values obtained from the log10 transformed raw intensity gene values all in excel. Next post will have my Triplicate output graphs following 3 trial runs at 1 hour.
Attachments
NRF1100.txt
(4.77 KiB) Downloaded 109 times
OneDrive_1_9-14-2019.zip
(129.94 MiB) Downloaded 124 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Sat Sep 14, 2019 6:16 pm

Here are the results from my first three trial runs at 1 hour each, BDE scores for the top networks were as follows:
Dataset: 102 variables and 20 Observations
Trial 1: -284.1756
Trial 2: -279.1758
Trial 3: -282.9458
Attachments
Screenshot from 2019-09-14 18-31-41.png
Top Graph Run 3
Screenshot from 2019-09-14 18-31-41.png (276.2 KiB) Viewed 2401 times
Screenshot from 2019-09-14 18-29-29.png
Top Graph RUN 2
Screenshot from 2019-09-14 18-29-29.png (340.3 KiB) Viewed 2401 times
Screenshot from 2019-09-14 18-23-37.png
Top Graph RUN 1
Screenshot from 2019-09-14 18-23-37.png (237.49 KiB) Viewed 2401 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Mon Sep 16, 2019 7:03 pm

Following the triplicate runs conducted at 1 hour for my 102 variables of interest in BANJO, I conducted a 2 hour run with the same dataset to see if any causal networks were repeated. Attached below is my output from my 1st 2 hour trial run, two genes of interest that have consistently been connected to Condition (Alzheimer's Disease) have been RPL23AP7 and ALOX5AP. I will post a write-up of my next results as I move along with my project. Note the BDE score has improved to -272.2309.
Attachments
consensus.graphNRF1_AD2trial1.2019.09.14.17.32.29.txt
(4.64 KiB) Downloaded 121 times
static.report.AD_2trial1.2019.09.14.17.32.29.txt
(152.86 KiB) Downloaded 117 times
ConsensusGraph2hourstrial1.png
ConsensusGraph2hourstrial1.png (390.63 KiB) Viewed 2395 times
2HoursTrial1Topgraph.png
2HoursTrial1Topgraph.png (407.59 KiB) Viewed 2395 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Thu Sep 19, 2019 1:11 pm

After our recent SMLG meeting where Dr. Yoo's & Dr. Felty advised me to make two separate lists for the Pearson correlation rankings of the top 417 genes reported in the supplementary file of GSE45596, I have made the appropriate corrections to my data and began two separate 1 hour trial runs in BANJO for each correlation table (AD correlated genes and NRF1 target genes). Attached to this message is the file I used for my correlation calculations along with the raw intensity values, Z-scores, and discretization calculations for each of the top correlated genes used in BANJO. The excel file is still a little unorganized in terms of labeling so I intend to clean up each sheet to make it easier for the reader to understand my output over the next week. But the key sheet to view here would be the one labeled "correlation tables" as they're you have two tables representing NRF1 target gene's and AD disease gene rankings according to Pearson correlation values (derived from the excel function "CORREL") highlighted in "green" on the left for the top 50 NRF1 target genes and then highlighted in grey on the right for the top 50 AD correlated genes. I've also attached two text files representing my input into BANJO as discretized files for each of my trial runs. I'll post the results beginning tomorrow.
Attachments
ADTOP.txt
(2.41 KiB) Downloaded 113 times
NRF1Targets.txt
(2.5 KiB) Downloaded 109 times
BANJO.xlsx
(132.59 MiB) Downloaded 110 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm

Re: Microarray Analysis GEO Alzheimer's Dataset

Postby cpere117 » Wed Sep 25, 2019 3:37 pm

In order to focus our machine learning analysis upon genes known to be involved in endothelial senescence and pathological angiogenesis, I've refocused my analysis upon ID3 and it's gene targets that we have validated primarily from our own Chip-Seq analysis conducted in our lab. We have the following primary interest: 1) Does ID3 and/or its target genes result in the connection of a causal pathway that is reported to be involved in the process of endothelial senescence at the endothelial cells located at the vessel wall?, and if so 2) Are there any recurrent causal pathways that are replicated across several probabilistic graphical models produced by Banjo at several timepoint's of trial runs? Today I've initiated the first of trial runs for ID3 and it's target genes "found" to be significant according to a fold change criteria of 2.0 and following the conductance of a asymptomatic t-test in the same protocol as conducted by the researchers in the attached paper. In total, I have 39 variables of interest (includes clinical variables such as age, gender, and condition) and 20 observations I'm currently running at 1 hour in BANJO. I will post the results on the hour, and will follow up by running the best structure into a 2 hour trial run. Thank you for your attention.
Attachments
ID3TAR417.txt
Target File of ID3 Genes that were found Amongst 417 genes determined to have above or below 2.0 FC
(1.8 KiB) Downloaded 113 times
qaisar2012.pdf
Published Paper on Brain Angiogenic Vessels involved in AD
(613.64 KiB) Downloaded 108 times
cpere117
 
Posts: 38
Joined: Thu Aug 24, 2017 7:15 pm


Return to Alzheimer

Who is online

Users browsing this forum: No registered users and 1 guest

cron