cwyoo wrote:Query Series GSE198411 (RNAseq)
Experimental condition (Filename)
===========================================
control #1 (h-cont-1.txt)
control #2 (h-cont-2.txt)
shRNA67 #1 (h-sh67-1.txt)
shRNA67 #2 (h-sh67-2.txt)
shRNA103 #1 (h-sh103-1.txt)
shRNA103 #2 (h-sh103-2.txt)
Each tab delimited text file contains Transcript ID, Gene Symbol, and raw counts.
Genes of interest:
TFEB. TFE3, LAMP1, LAMP2, HSPA8, SLC39A14, VAMP7, TRPML1, SNAP23, GABARAPL1, ATP6V0E1, ATP6V0D1, ATP6V1G1, ATP6V0C, ATP6V0A4, cathepsin D, cathepsin L, cathepsin S, cathepsin H, MT1E, SQSTM1, MAP1LC3B, ATG7, LMNA, LMNB, LMNB2, CLDN7, CLDN1, CLDN12, and CDKN1A.
Create one dataset that combines all the above six tab delimited files (referred as "six files") by the following steps:
Step 1. Find out the sets of genes that are common (referred as "common genes") in all six files
Step 2. For each gene in the common genes, record each raw counts for each of six experiments
Step 3. For each raw counts, perform "DESeq2-normalized counts: Median of ratios method" presented in
https://hbctraining.github.io/DGE_workshop_salmon/lessons/02_DGE_count_normalization.htmlStep 4. Create a variable called "ZIP11 Knock Down" and assign "0" for two controls, "1" for two shRNA67, and "2" for two shRNA103.
Step 5. Using the dataset file from Step 4, create a dataset with roughly 200 genes that show the highest fold change in controls compared to shRNA67 (roughly 100 genes) and controls compared to shRNA103 (roughly 100 genes). Add expression levels of Genes of Interest and include "ZIP11 Knock Down" variable in this file.
Step 6. Using the dataset file from Step 5, discretize each gene's expression to 0, 1, 2 using Z-score.
Step 7. Perform banjo analysis on the file created in Step 6.