SMLG (Statistical Machine Learning Group) Discussion Forum

Posted: **Thu Mar 11, 2021 1:50 pm**

Dear SMLG Members,

I'm working on a mayo clinic RNA-Seq cohort where datasets I've compiled are derived from three separate labs. In the attachment you can find the raw data from each of these three separate labs, annotated by ensemble ID. If you can help me with the normalization of this data from across three datasets I would be greatly appreciative. The Dr. Taner dataset attached represents Cerebellum data from 200 Cerebral Amyloid Angiopathy Cases derived from this synapse repository: https://www.synapse.org/#!Synapse:syn10930230. The Dr. Bu dataset attached represents Temporal Cortex data from 75 Cerebral Amyloid Angiopathy Cases. Lastly, my Control dataset attached represents data from the Mayo-Clinic AD Cohort and can be found in this synapse repository: https://www.synapse.org/#!Synapse:syn5550404. I would like to see what would be best to normalize this data. I've normalized the data according to the Trim Mean Normalization method across all three separate datasets. I'm interested to see any other normalization methods that can help me compare cases and controls between these three studies. Thank you.

CP

Posted: **Tue Sep 28, 2021 6:13 pm**

Dr. Yoo,

Attached is the excel file on how I cleaned the Mayo clinic data for Banjo. The 313 genes represent ID3 targets of interest and clinical variables derived from the Mayo clinic data from my original post. To the left of the spreadsheet is the raw counts of the data. In the second step, I log2 transformed the data to assure normalized distribution. I then calculated the mean and standard deviation of the log2 transformed data. Using the (log2 transformed expression value - the mean log2 expression value)/ standard deviation of the log2 transformed gene expression value I calculated the Z scores for each of my genes across 73 samples. The last data matrix in the sheet to the right is the discretized values based on the Z scores ( 0 (< -1 Z score), 1 (Between -1 to 1 Z score), 2 (> 1 Z score)).

Christian

@cwyoo

Posted: **Tue Oct 19, 2021 1:05 pm**

Dr. Yoo,

Attached is the excel file on how I cleaned the Mayo clinic data for Banjo. The 290 genes represent NRF1 targets of interest and clinical variables derived from the Mayo clinic data from my original post. To the left of the spreadsheet is the raw counts of the data. In the second step, I log2 transformed the data to assure normalized distribution. I then calculated the mean and standard deviation of the log2 transformed data. Using the (log2 transformed expression value - the mean log2 expression value)/ standard deviation of the log2 transformed gene expression value I calculated the Z scores for each of my genes across 80 Male samples. The last data matrix in the sheet to the right is the discretized values based on the Z scores ( 0 (< -1 Z score), 1 (Between -1 to 1 Z score), 2 (> 1 Z score)).

Christian

SMLG (Statistical Machine Learning Group) Discussion Forum

Mayo Clinic RNA-Seq Analysis

Mayo Clinic RNA-Seq Analysis

Re: Mayo Clinic RNA-Seq Analysis

Re: Mayo Clinic RNA-Seq Analysis