SMLG (Statistical Machine Learning Group) Discussion Forum

Posted: **Wed Jun 20, 2018 1:43 pm**

I'd like to investigate which genes and a network have effect on mesenchymal stem cell (MSC) according to ageing.
Therefore, I have been trying to make a causal Bayesian network with genes of mesenchymal stem cell from young and old persons.
From now, I will post up the data that come from this study.

Brief outline of the study is following .
1. Search for public data in GEO and downloading raw data
2. cleaning and read counting
3. Investigating the differential expression of genes
4. Normalization of raw data
5, Banjo analysis
6. application of new order code and structure code.

Posted: **Wed Jun 20, 2018 1:48 pm**

[Collecting data with RNA sequencing from GEO]

Searching data
Key word: osteoblast AND Homo sapiens
Study type: “Expression profiling by high throughput sequencing” and “ genome binding/occupancy profiling by high throughput sequencing”
Result: one study
Global transcriptional profiling using RNA sequencing and DNA methylation patterns in highly enriched mesenchymal cells from young versus elderly women. (GSE94736)
PMID and published article: Bone 2015 Jul;76:49-57. PMID: 25827254

Overall design
- examination of gene expression and DNA methylation patterns from a highly enriched bone marrow mesenchymal cell population from young (mean age, 28.7 years) versus old (mean age, 73.3 years) women
Data processing
- RNA-sequencing, transcriptomic, cDNA,
- Paired-end reads from the raw RNAseq data were aligned using TopHat (version 2.0.6)
- Quality control assessments were made using RSeQC software
- Gene counts were generated using HTSeq software and gene annotation files were obtained from Illumina
- Genome_build: h19
- Supplementary_files_format_and_content: tab-delimited text files include RPKM and count values for each Sample

Number of samples
- 28 (young person samples: 15, old person samples: 13)

Posted: **Wed Jun 20, 2018 1:53 pm**

I download raw data from GEO using HISAT2.0
2. run Samtools for cleaning of sam files that were made after process 1
3. count reads of genes per samples using HTSeq-count

final counting files were attached

Posted: **Wed Jun 20, 2018 1:59 pm**

I got the read counts using DESeq2.
After installation of DESeq library in R studio, performed the DESeq2 analysis in R.

DESeq2 R code is as follows and I attached the results.

#Efrain Gonzalez
#January 23, 2018
#R code for DESeq2 Analysis Dr. Hooi
library(DESeq2)
##Choose directory with files in it
##set directory before this step
mydirectory <- getwd()

##Grab Treated files
myFiles <- grep("Count",list.files(mydirectory),value=TRUE)

##chop up file name to obtain condition status
#mycondition <- sub("(.*R4).*","\\1",myFiles)
mycondition <- c(rep("OOB",13),rep("YOB",15))
myTable <- data.frame(sampleName = myFiles, fileName = myFiles, condition = mycondition)

##Build DESeqDataSet
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = myTable, directory = mydirectory, design = ~ condition)

ddsHTSeq

##Collapse technical replicates

ddsHTSeq$sample <- factor(c(rep("OOB",13),rep("YOB",15)))

ddsHTSeq$run <- paste0("run",1:28)

##Analysis of Differential Expression for Biological Replicates
DESR <- DESeq(ddsHTSeq)

#R4 VS R5
RES1 <- results(DESR,contrast = c("condition","OOB","YOB"))
RES1ordered <- RES1[order(RES1$pvalue),]

#R1 VS R4
#RES3 <- results(DESR,contrast = c("condition","R1","R4"))
#RES3ordered <- RES3[order(RES3$pvalue),]

write.table(as.data.frame(RES1ordered),sep="\t",col.names = TRUE, row.names= TRUE,file="OOB_vs_YOB_results1.txt")
#write.table(as.data.frame(RES3ordered),sep="\t",col.names = TRUE, row.names= TRUE,file="R1_R4_results.txt")

Posted: **Wed Jun 20, 2018 2:06 pm**

RPKM, FPKM and TPM used to be when we did RNA-seq, we reported our results in RPKM (Reads Per Kilobase Million) or FPKM (Fragments Per Kilobase Million). However, DESeq2 seems to be appropriate for normalization methods among them. (BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.)
Before entering normalization steps, I posted the difference of those.
Download the ppt file.

Posted: **Thu Jun 21, 2018 2:13 pm**

I made a master file with read counts of all samples.
the number of library genes is 57954.
I attached the file.

SMLG (Statistical Machine Learning Group) Discussion Forum

RNAseq in msenchymal stem cell from young and old persons

RNAseq in msenchymal stem cell from young and old persons

1. Search for public data in GEO and downloading raw data

2. cleaning and read counting

3. Investigating the differential expression of genes

Difference of RPKM, FPKM , TPM and DESeq2

Master file of read count