RNAseq in msenchymal stem cell from young and old persons

Currently undergoing other projects. Once there is significant traction, we will create new forums for those analyses and results.

RNAseq in msenchymal stem cell from young and old persons

Postby ddolbae01 » Wed Jun 20, 2018 1:43 pm

I'd like to investigate which genes and a network have effect on mesenchymal stem cell (MSC) according to ageing.
Therefore, I have been trying to make a causal Bayesian network with genes of mesenchymal stem cell from young and old persons.
From now, I will post up the data that come from this study.

Brief outline of the study is following .
1. Search for public data in GEO and downloading raw data
2. cleaning and read counting
3. Investigating the differential expression of genes
4. Normalization of raw data
5, Banjo analysis
6. application of new order code and structure code.
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am

1. Search for public data in GEO and downloading raw data

Postby ddolbae01 » Wed Jun 20, 2018 1:48 pm

[Collecting data with RNA sequencing from GEO]

Searching data
Key word: osteoblast AND Homo sapiens
Study type: “Expression profiling by high throughput sequencing” and “ genome binding/occupancy profiling by high throughput sequencing”
Result: one study
Global transcriptional profiling using RNA sequencing and DNA methylation patterns in highly enriched mesenchymal cells from young versus elderly women. (GSE94736)
PMID and published article: Bone 2015 Jul;76:49-57. PMID: 25827254

Overall design
- examination of gene expression and DNA methylation patterns from a highly enriched bone marrow mesenchymal cell population from young (mean age, 28.7 years) versus old (mean age, 73.3 years) women
Data processing
- RNA-sequencing, transcriptomic, cDNA,
- Paired-end reads from the raw RNAseq data were aligned using TopHat (version 2.0.6)
- Quality control assessments were made using RSeQC software
- Gene counts were generated using HTSeq software and gene annotation files were obtained from Illumina
- Genome_build: h19
- Supplementary_files_format_and_content: tab-delimited text files include RPKM and count values for each Sample

Number of samples
- 28 (young person samples: 15, old person samples: 13)
Last edited by ddolbae01 on Wed Jun 20, 2018 1:55 pm, edited 1 time in total.
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am

2. cleaning and read counting

Postby ddolbae01 » Wed Jun 20, 2018 1:53 pm

I download raw data from GEO using HISAT2.0
2. run Samtools for cleaning of sam files that were made after process 1
3. count reads of genes per samples using HTSeq-count

final counting files were attached
Attachments
HTseqRNAcount.tar.gz
(4.91 MiB) Downloaded 166 times
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am

3. Investigating the differential expression of genes

Postby ddolbae01 » Wed Jun 20, 2018 1:59 pm

I got the read counts using DESeq2.
After installation of DESeq library in R studio, performed the DESeq2 analysis in R.

DESeq2 R code is as follows and I attached the results.

#Efrain Gonzalez
#January 23, 2018
#R code for DESeq2 Analysis Dr. Hooi
library(DESeq2)
##Choose directory with files in it
##set directory before this step
mydirectory <- getwd()

##Grab Treated files
myFiles <- grep("Count",list.files(mydirectory),value=TRUE)

##chop up file name to obtain condition status
#mycondition <- sub("(.*R4).*","\\1",myFiles)
mycondition <- c(rep("OOB",13),rep("YOB",15))
myTable <- data.frame(sampleName = myFiles, fileName = myFiles, condition = mycondition)

##Build DESeqDataSet
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = myTable, directory = mydirectory, design = ~ condition)

ddsHTSeq


##Collapse technical replicates

ddsHTSeq$sample <- factor(c(rep("OOB",13),rep("YOB",15)))

ddsHTSeq$run <- paste0("run",1:28)

##Analysis of Differential Expression for Biological Replicates
DESR <- DESeq(ddsHTSeq)

#R4 VS R5
RES1 <- results(DESR,contrast = c("condition","OOB","YOB"))
RES1ordered <- RES1[order(RES1$pvalue),]

#R1 VS R4
#RES3 <- results(DESR,contrast = c("condition","R1","R4"))
#RES3ordered <- RES3[order(RES3$pvalue),]

write.table(as.data.frame(RES1ordered),sep="\t",col.names = TRUE, row.names= TRUE,file="OOB_vs_YOB_results1.txt")
#write.table(as.data.frame(RES3ordered),sep="\t",col.names = TRUE, row.names= TRUE,file="R1_R4_results.txt")
Attachments
OOB_vs_YOB_results.txt
(4.45 MiB) Downloaded 177 times
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am

Difference of RPKM, FPKM , TPM and DESeq2

Postby ddolbae01 » Wed Jun 20, 2018 2:06 pm

RPKM, FPKM and TPM used to be when we did RNA-seq, we reported our results in RPKM (Reads Per Kilobase Million) or FPKM (Fragments Per Kilobase Million). However, DESeq2 seems to be appropriate for normalization methods among them. (BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.)
Before entering normalization steps, I posted the difference of those.
Download the ppt file.
Attachments
RPKM vs FPKM vs TPM vs DESeq2.ppt
(18.74 MiB) Downloaded 184 times
Last edited by ddolbae01 on Tue Jun 26, 2018 1:01 am, edited 1 time in total.
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am

Master file of read count

Postby ddolbae01 » Thu Jun 21, 2018 2:13 pm

I made a master file with read counts of all samples.
the number of library genes is 57954.
I attached the file.
Attachments
MasterfileForReadCount.xlsx
(5.69 MiB) Downloaded 189 times
ddolbae01
 
Posts: 47
Joined: Wed Jul 19, 2017 2:14 am


Return to Others

Who is online

Users browsing this forum: No registered users and 2 guests

cron