Page 2 of 2

Paired-end RNA-seq

PostPosted: Thu Nov 16, 2017 11:37 am
by efrain.gonzalez0
Hello,

Earlier yesterday I obtained several warnings when trying to run htseq-count on the alignment output produced by HISAT2. The warnings occurred at almost every line and stated the following
Read (read#) claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
After some searching I discovered that this is an issue that occurs when using htseq-count on paired-end data. The solution is that after obtaining the HISAT2 alignment output one must use the samtools sort function in order to sort the data by name. This can be done with the following command:
Code: Select all
./samtools sort -n -O SAM ~/Desktop/HISAT2_HOME/DrHooi/R1.4_Hisat2.sam -o ~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam

Where "~/Desktop/HISAT2_HOME/DrHooi/R1.4_Hisat2.sam" represents the alignment file produced by HISAT2 and "~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam" represents the new sorted file that will be created.
You can find out how to install and download samtools by going to this link: http://www.htslib.org/
You may be able to use the command
Code: Select all
sudo apt-get install samtools
in order to install samtools but this may not be the latest version and so it may not have all of the features you need. For example in my case the version of samtools available off of Ubuntu only allowed for you to input BAM files. While it is fairly simple to convert between SAM and BAM it is easier and less time consuming if no conversion has to be done at all.

Furthermore, for paired-end data you will be required to have the most updated version of htseq-count. Although you may be able to update htseq-count in the following way
Code: Select all
sudo apt-get install python-htseq
this will not necessarily be the most updated version of htseq and so I recommend you use the following code:
Code: Select all
pip install HTSeq --user
This command will provide the most updated version of htseq. I required the updated version for htseq-count because it included the "-r" option.
Afterwards you will be able to use the following command in order to run htseq-count:
Code: Select all
htseq-count -m intersection-nonempty -s no -i gene_name -r name ~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam ~/Desktop/Homo_sapiens.GRCh38.89.chr_patch_hapl_scaff.gtf -o R1.4_COUNT.sam > R1.4_Count.txt

Now when you run this command you should see little to no warnings.

Re: RNA-seq

PostPosted: Wed Mar 13, 2019 12:38 am
by cpere117
Results mapping "exon proteins" only, for RNA-Seq analysis of Dr. Felty's three samples. Btw, this is single stranded RNA sequencing data not meant to be paired end sequencing.

Re: RNA-seq

PostPosted: Mon Mar 18, 2019 4:05 pm
by cpere117
Note, that these results display exon proteins primarily. I plan to gather the results from intron proteins and mRNA proteins once I've completed analysis.

Re: RNA-seq

PostPosted: Thu Apr 18, 2019 10:09 pm
by cpere117
Here is an updated list of annotated ID's and normalized counts produced for the three RNA-Seq samples of ID3