Earlier yesterday I obtained several warnings when trying to run htseq-count on the alignment output produced by HISAT2. The warnings occurred at almost every line and stated the following
After some searching I discovered that this is an issue that occurs when using htseq-count on paired-end data. The solution is that after obtaining the HISAT2 alignment output one must use the samtools sort function in order to sort the data by name. This can be done with the following command:Read (read#) claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
- Code: Select all
./samtools sort -n -O SAM ~/Desktop/HISAT2_HOME/DrHooi/R1.4_Hisat2.sam -o ~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam
Where "~/Desktop/HISAT2_HOME/DrHooi/R1.4_Hisat2.sam" represents the alignment file produced by HISAT2 and "~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam" represents the new sorted file that will be created.
You can find out how to install and download samtools by going to this link: http://www.htslib.org/
You may be able to use the command
- Code: Select all
sudo apt-get install samtools
Furthermore, for paired-end data you will be required to have the most updated version of htseq-count. Although you may be able to update htseq-count in the following way
- Code: Select all
sudo apt-get install python-htseq
- Code: Select all
pip install HTSeq --user
Afterwards you will be able to use the following command in order to run htseq-count:
- Code: Select all
htseq-count -m intersection-nonempty -s no -i gene_name -r name ~/Desktop/HISAT2_HOME/DrHooi/R1.4_sortHIS.sam ~/Desktop/Homo_sapiens.GRCh38.89.chr_patch_hapl_scaff.gtf -o R1.4_COUNT.sam > R1.4_Count.txt
Now when you run this command you should see little to no warnings.