Differential Gene Expression Analysis (DGEA) Packages in R

The R Project for Statistical Computing (available in Windows, Mac OS, and Linux)

Differential Gene Expression Analysis (DGEA) Packages in R

Postby Kaumudi » Fri Feb 23, 2018 9:59 am

In this thread, I will be posting about my experiences implementing DGEA packages for the data I am working with, and the types of questions we are interested in asking.

This past week I spent time understanding a little about the various packages that are out there. It is very important to understand the nature of your input data (RNASeq vs microarray, which technology platform was used eg Illumina, Affy, etc. and whether or not and how the data was normalized).

I am working with level 3 RNASeqv2RSEM data from TCGA. Upon scourging the various confusing forums and literature out there, I decided to go with the EdgeR package in R for DGEA (see attached slides
Some Useful Info.pptx
(255.93 KiB) Downloaded 149 times
).

I modified and ran script in RStudio and I was able to get the DGEA lists (log2FC, logCPM, p-values) for the two groups I was comparing. I then used Excel/OfficeLibre to generate a volcano plot of the data. Next week I will post more about the actual script I used.
Kaumudi
 
Posts: 76
Joined: Tue Feb 14, 2017 12:38 pm

Re: Differential Gene Expression Analysis (DGEA) Packages in

Postby Kaumudi » Sat Mar 31, 2018 9:22 pm

Basic R script for EdgeR (to be modified with your file names, locations etc.):

You have to first download EdgeR from Bioconductor into RStudio, and then import your table from either an excel or .csv file - you can do read.delim or read.table for that.

Then:

LGT <- read.delim("~/Desktop/LGT.csv", row.names=1)
View(LGT)
condition=c(rep('g', 154), rep('l',513))
dge<-DGEList(counts=LGT,group=condition)
design<-model.matrix(~condition+0,data=dge$samples)
colnames(design)=gsub("condition","",colnames(design))
disp <- estimateGLMCommonDisp(dge, design)
disp <- estimateGLMTrendedDisp(disp, design)
disp <- estimateGLMTagwiseDisp(disp, design)
fit <- glmFit(disp, design)
>fit <- glmFit(disp, design)
avsb=makeContrasts(a-b,levels=design)
lrt.avsb = glmLRT(fit, contrast=avsb)
gvsl=makeContrasts(g-l,levels=design)
lrt.gvsl=glmLRT(fit, contrast=gvsl)
res=as.data.frame(lrt.gvsl)
setwd("/mnt/disk/home/kaumudib/kaumudi")
attributes(lrt.gvsl)
write.table(lrt.gvsl$table,"gvslTable.txt",sep = "\t",col.names = TRUE,row.names = TRUE)

Of course you will have to modify file names/locations etc based on your data and file locations...
Kaumudi
 
Posts: 76
Joined: Tue Feb 14, 2017 12:38 pm

About limma and EdgeR

Postby Kaumudi » Sun Apr 29, 2018 4:10 pm

EdgeR and limma differ in the statistical models they use for the DGEA (I will post more on this next week)

In terms of R script...
For limma, you can start off just like for EdgeR, with the following commands:
condition=c(rep('g', 154), rep('l',513))
dge<-DGEList(counts=LGT,group=condition)
design<-model.matrix(~condition+0,data=dge$samples)
colnames(design)=gsub("condition","",colnames(design))

Then, you follow up with (of course modify as per your filename etc.):
# calculate normalization factors between libraries
nf <- calcNormFactors(mycounts)
#
# normalise the read counts with 'voom' function
y <- voom(mycounts,design,lib.size=colSums(mycounts)*nf)
# extract the normalised read counts
counts.voom <- y$E

# save normalised expression data into output dir
write.table(counts.voom,file="~/counts.voom.txt",row.names=T,quote=F,sep="\t")

# fit linear model for each gene given a series of libraries
fit <- lmFit(y,design)
# construct the contrast matrix corresponding to specified contrasts of a set of parameters
cont.matrix <- makeContrasts(liver-brain,levels=design)
cont.matrix
# compute estimated coefficients and standard errors for a given set of contrasts
fit <- contrasts.fit(fit, cont.matrix)

# compute moderated t-statistics of differential expression by empirical Bayes moderation of the standard errors
fit <- eBayes(fit)
options(digits=3)

# check the output fit
dim(fit)

# get the full table ("n = number of genes in the fit")
limma.res <- topTable(fit,coef=1,n=dim(fit)[1])

***Here you can 'setwd' first and then...

# write limma output table for significant genes into a tab delimited file
filename = paste("~/DEgenes_","_pval","_logFC",".txt",sep="")
write.table(limma.res,file=filename,row.names=T,quote=F,sep="\t")
Kaumudi
 
Posts: 76
Joined: Tue Feb 14, 2017 12:38 pm


Return to R

Who is online

Users browsing this forum: No registered users and 4 guests