Page 1 of 2

RNA-seq

PostPosted: Tue Apr 12, 2016 1:34 pm
by Feltyq
These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.

Re: RNA-seq

PostPosted: Thu Apr 21, 2016 2:00 pm
by cwyoo
Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p2 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz

Re: RNA-seq

PostPosted: Mon Apr 25, 2016 10:24 am
by cwyoo
cwyoo wrote:
Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


Attached are the sorted counts.

Re: RNA-seq

PostPosted: Tue Apr 26, 2016 11:58 pm
by cwyoo
cwyoo wrote:
cwyoo wrote:
Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


Attached are the sorted counts.


These are sorted genes by p-value and fold change between the experimental conditions:

Re: RNA-seq

PostPosted: Tue Feb 14, 2017 11:55 am
by cwyoo
cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz

Re: RNA-seq

PostPosted: Thu Feb 16, 2017 4:58 pm
by cwyoo
cwyoo wrote:
cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz


These are DNA sequence analyses results.

Re: RNA-seq

PostPosted: Sat Mar 11, 2017 9:39 pm
by cwyoo
cwyoo wrote:
cwyoo wrote:
cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz


Using the above settings, these are ID3chipseq analyses results.

Re: Using SRA ToolKit, HISAT2, and htseq-count

PostPosted: Tue Aug 01, 2017 3:30 pm
by efrain.gonzalez0
If you saw my posts within the Alzheimer's forum then you are aware of the fact that I have been able to convert from .sra to .fastq by way of the SRA ToolKit. The commands for doing so are as follows:
  1. Check if a path exists to the .sra file in question:
    Code: Select all
    ./srapath SRR######
  2. Read in and Convert the .sra file to a .fastq file:
    Code: Select all
    ./fastq-dump SRR######
Where ####### represents the SRA accession number for the file. Recently I have been working with the HISAT2 alignment tool and I wanted to make sure that everyone understood the commands that I used so I am posting them here:
  1. To begin after the installation of the HISAT2 tool I had to set the appropriate index required. One can build an index by using the hisat2-build command. An example of using this command for several FASTA files is included below. The files were obtained from the ENSEMBL website http://useast.ensembl.org/info/data/ftp/index.html. You must also download the GTF file which you will be needing later on in the process.
    • Code: Select all
      ./hisat2-build `ls *fasta | awk '{printf("%s,"$1)}' | sed -e 's/,$//'` HT2_IDX

      Where fasta represents a folder that contains the fasta files that you downloaded from the ENSEMBL website and then unzipped.
    However, in my reading it was stated that it is better to use pre-built indexes. I have found such files on the right hand side of the HISAT2 website https://ccb.jhu.edu/software/hisat2/manual.shtml. I used the file that was named grch38 and then used the make function that came with it in order to build a fresh copy of the index. Although I found that by using the grch38_snp_tran file one can obtain a better overall alignment percentage, there is no way to build this index on any of our servers as it requires 200GB of RAM.
  2. Now that the indexes were built I attempted to align the data by using the following code:
    Code: Select all
    ./hisat2 -x ./grch38/genome -U SRR######.fastq -S alignedfile.sam

The command for htseq-count is as follows:
  1. Code: Select all
    python -m HTSeq.scripts.count -m intersection-nonempty -s no -i gene_name ~/Desktop/HISAT2/alignedfile.sam ~/Desktop/genome.gtf -o  Outputfile.samout > Outputfile_Counts.txt

Access to controlled Data on GDC

PostPosted: Wed Sep 27, 2017 2:20 pm
by efrain.gonzalez0

Using HISAT2 and BowTie2 on Dr. Roy's Data

PostPosted: Fri Oct 20, 2017 11:58 am
by efrain.gonzalez0
Hello everyone,

I am posting some of the results of the HISAT2 and Bowtie2 analysis on Professor Roy's Data. I will also be comparing particular outputs for certain genes to see where the differences lie in the alignment results for these genes. I will be doing it by using the HISAT2 commands that I have posted as well as some Bowtie2 commands that I include in this post. In order to get the actual genes I will be using htseq-count on both the HISAT2 and Bowtie2 files. In this post I will also discuss the initial problem that I faced when using Bowtie2 with htseq-count and how to resolve it.

  1. The command to run Bowtie2 is similar to HISAT2. All of the code was run within the Bowtie2 directory. The code looked like this:
    Code: Select all
    ./bowtie2 -x ./Bowtie2Index/genome -U ~/Sample-A-Input_S43_L007_R1_001.fastq -S BowTie_Sample_A_Input2.sam

  2. The htseq-count code that I used was the same as when I used it for the HISAT2 files. After obtaining the htseq-count outputs for both the Bowtie2 and HISAT2 files we wanted to be able to compare the outputs amongst certain genes (NRF1, APOE, etc.). I was able to compare them by using the following commands:
    1. This command allows you to extract information on a certain gene from any htseq-count sam output file and put that information into a new file:
      Code: Select all
      grep GENE_NAME htseq_count_output.sam > gene_info.txt

      Ex:
      Code: Select all
      grep APOE COUNTDrRoy_Bowtie.sam > DrRoy_APOE_Bowtie.txt
    2. This command allows you to sort the file above in ascending numerical order based on the starting position of a particular sequence (this information is usually found in column four of the txt file) and output that information into a new file:
      Code: Select all
      sort -t$'\t' -k4,4g gene_info.txt > sorted_gene_info.txt

      Ex:
      Code: Select all
      sort -t$'\t' -k4,4g DrRoy_APOE_Bowtie.txt > sort_DrRoy_APOE_Bowtie.txt
  3. The problem: htseq-count was giving 0's as the counts for each gene.
    The why: It seems as though this is often a problem of using an index in Bowtie2/HISAT2 that does not match with the reference that you are using in htseq-count.
    The solution: Make sure that the index you are using in HISAT2/Bowtie2 is by the same people whose reference you will be using in htseq-count.
    Ex: In my case I was using an index that was by NCBI for my Bowtie2 analysis but then my reference for the htseq-count was by Ensembl. After changing my index to one created by Ensembl I no longer obtained only zeros in my htseq-count output. You can obtain several different indexes from Illumina's iGenome collection located at the following url: https://support.illumina.com/sequencing/sequencing_software/igenome.html
    Here are some references that may help you resolve this issue in the future:
    1. https://galaxyproject.org/support/chrom-identifiers/
    2. https://www.biostars.org/p/220756/
    3. https://biostar.usegalaxy.org/p/19790/