RNA-seq

Currently undergoing other projects. Once there is significant traction, we will create new forums for those analyses and results.

RNA-seq

Postby Feltyq » Tue Apr 12, 2016 1:34 pm

These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.
Attachments
qFelty_qRNASeq1_201603254-01_S_1_1.txt.bz2
(561.01 MiB) Downloaded 208 times
qFelty_qRNASeq1_201603253-01_S_1_1.txt.bz2
(624.98 MiB) Downloaded 203 times
qFelty_qRNASeq1_201603252-01_S_1_1.txt.bz2
(555.04 MiB) Downloaded 198 times
qFelty_qRNASeq1_201603254-01_S_0_1.txt.bz2
(357.63 MiB) Downloaded 221 times
qFelty_qRNASeq1_201603253-01_S_0_1.txt.bz2
(317.12 MiB) Downloaded 190 times
qFelty_qRNASeq1_201603252-01_S_0_1.txt.bz2
(282.35 MiB) Downloaded 213 times
Feltyq
 
Posts: 1
Joined: Tue Oct 13, 2015 12:53 pm

Re: RNA-seq

Postby cwyoo » Thu Apr 21, 2016 2:00 pm

Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p2 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: RNA-seq

Postby cwyoo » Mon Apr 25, 2016 10:24 am

cwyoo wrote:
Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


Attached are the sorted counts.
Attachments
out-qRNASeq1_201603252-grch38.samout.sorted.txt
Sorted counts for combined 201603252 files (updated with gene names)
(756.5 KiB) Downloaded 217 times
out-qRNASeq1_201603253-grch38.samout.sorted.txt
Sorted counts for combined 201603253 files (updated with gene names)
(757.42 KiB) Downloaded 197 times
out-qRNASeq1_201603254-grch38.samout.sorted.txt
Sorted counts for combined 201603254 files (updated with gene names)
(760.05 KiB) Downloaded 218 times
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: RNA-seq

Postby cwyoo » Tue Apr 26, 2016 11:58 pm

cwyoo wrote:
cwyoo wrote:
Feltyq wrote:These are 3 samples with RNA sequencing data. You will need to combine all the fastq files for each sample before alignment.


Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


Attached are the sorted counts.


These are sorted genes by p-value and fold change between the experimental conditions:
Attachments
zero.txt
Fold change with zero counts
(276.48 KiB) Downloaded 230 times
p-value-non-zero-34.txt
Fold change between 20160325-3 (reference) and 20160325-4 (comparison)
(1.7 MiB) Downloaded 214 times
p-value-non-zero-24.txt
Fold change between 20160325-2 (reference) and 20160325-4 (comparison)
(1.65 MiB) Downloaded 214 times
p-value-non-zero-23.txt
Fold change between 20160325-2 (reference) and 20160325-3 (comparison)
(1.61 MiB) Downloaded 194 times
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: RNA-seq

Postby cwyoo » Tue Feb 14, 2017 11:55 am

cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz
Attachments
out-Sample-E_S52_L008_R1_001-grch38.txt
Sample E
(672.12 KiB) Downloaded 188 times
out-Sample-D_S51_L008_R1_001-grch38.txt
Sample D
(672.91 KiB) Downloaded 212 times
out-Sample-C_S50_L008_R1_001-grch38.txt
Sample C
(662.42 KiB) Downloaded 202 times
out-Sample-B_S49_L008_R1_001-grch38.txt
Sample B
(663.36 KiB) Downloaded 206 times
out-Sample-A_S48_L008_R1_001-grch38.txt
Sample A
(663.94 KiB) Downloaded 208 times
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: RNA-seq

Postby cwyoo » Thu Feb 16, 2017 4:58 pm

cwyoo wrote:
cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz


These are DNA sequence analyses results.
Attachments
out-Sample-ChIP-C_S46_L007_R1_001-grch38.txt
Sample C Chip
(658.15 KiB) Downloaded 193 times
out-Sample-C-Input_S47_L007_R1_001-grch38.txt
Sample C Input
(654.96 KiB) Downloaded 190 times
out-Sample-ChIP-B_S44_L007_R1_001-grch38.txt
Sample B Chip
(653.55 KiB) Downloaded 184 times
out-Sample-B-Input_S45_L007_R1_001-grch38.txt
Sample B Input
(656.17 KiB) Downloaded 206 times
out-Sample-ChIP-A_S42_L007_R1_001-grch38.txt
Sample A Chip
(654.58 KiB) Downloaded 196 times
out-Sample-A-Input_S43_L007_R1_001-grch38.txt
Sample A Input
(659.94 KiB) Downloaded 195 times
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: RNA-seq

Postby cwyoo » Sat Mar 11, 2017 9:39 pm

cwyoo wrote:
cwyoo wrote:
cwyoo wrote:
Here are the tools that have been used to analyze the next-generation sequencing reads (RNA or DNA):

HISAT2 (see https://ccb.jhu.edu/software/hisat2/index.shtml)
htseq-count (module from HTSeq 0.6.1p1 see http://www-huber.embl.de/HTSeq/doc/count.html)
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.chr_patch_hapl_scaff.gtf.gz


For the new analysis, following versions have been used:
HISAT2 2.0.5 (release 11/4/2016)
htseq-count HTSeq 0.6.1p1
Gene Name (GTF file) downloaded from ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf.gz


Using the above settings, these are ID3chipseq analyses results.
Attachments
out-qFelty_ID3chipseq_201700236-01_S_8-grch38.txt
ID3chipseq_201700236
(641.51 KiB) Downloaded 192 times
out-qFelty_ID3chipseq_201700235-01_S_8-grch38.txt
ID3chipseq_201700235
(649.26 KiB) Downloaded 195 times
out-qFelty_ID3chipseq_201700234-01_S_8-grch38.txt
ID3chipseq_201700234
(667.68 KiB) Downloaded 197 times
out-qFelty_ID3chipseq_201700233-01_S_8-grch38.txt
ID3chipseq_201700233
(656.34 KiB) Downloaded 201 times
out-qFelty_ID3chipseq_201700232-01_S_8-grch38.txt
ID3chipseq_201700232
(653.85 KiB) Downloaded 190 times
out-qFelty_ID3chipseq_201700231-01_S_8-grch38.txt
ID3chipseq_201700231
(648.47 KiB) Downloaded 201 times
cwyoo
Site Admin
 
Posts: 377
Joined: Sun Jun 22, 2014 2:38 pm

Re: Using SRA ToolKit, HISAT2, and htseq-count

Postby efrain.gonzalez0 » Tue Aug 01, 2017 3:30 pm

If you saw my posts within the Alzheimer's forum then you are aware of the fact that I have been able to convert from .sra to .fastq by way of the SRA ToolKit. The commands for doing so are as follows:
  1. Check if a path exists to the .sra file in question:
    Code: Select all
    ./srapath SRR######
  2. Read in and Convert the .sra file to a .fastq file:
    Code: Select all
    ./fastq-dump SRR######
Where ####### represents the SRA accession number for the file. Recently I have been working with the HISAT2 alignment tool and I wanted to make sure that everyone understood the commands that I used so I am posting them here:
  1. To begin after the installation of the HISAT2 tool I had to set the appropriate index required. One can build an index by using the hisat2-build command. An example of using this command for several FASTA files is included below. The files were obtained from the ENSEMBL website http://useast.ensembl.org/info/data/ftp/index.html. You must also download the GTF file which you will be needing later on in the process.
    • Code: Select all
      ./hisat2-build `ls *fasta | awk '{printf("%s,"$1)}' | sed -e 's/,$//'` HT2_IDX

      Where fasta represents a folder that contains the fasta files that you downloaded from the ENSEMBL website and then unzipped.
    However, in my reading it was stated that it is better to use pre-built indexes. I have found such files on the right hand side of the HISAT2 website https://ccb.jhu.edu/software/hisat2/manual.shtml. I used the file that was named grch38 and then used the make function that came with it in order to build a fresh copy of the index. Although I found that by using the grch38_snp_tran file one can obtain a better overall alignment percentage, there is no way to build this index on any of our servers as it requires 200GB of RAM.
  2. Now that the indexes were built I attempted to align the data by using the following code:
    Code: Select all
    ./hisat2 -x ./grch38/genome -U SRR######.fastq -S alignedfile.sam

The command for htseq-count is as follows:
  1. Code: Select all
    python -m HTSeq.scripts.count -m intersection-nonempty -s no -i gene_name ~/Desktop/HISAT2/alignedfile.sam ~/Desktop/genome.gtf -o  Outputfile.samout > Outputfile_Counts.txt
efrain.gonzalez0
 
Posts: 138
Joined: Tue May 02, 2017 12:29 pm

Access to controlled Data on GDC

Postby efrain.gonzalez0 » Wed Sep 27, 2017 2:20 pm

efrain.gonzalez0
 
Posts: 138
Joined: Tue May 02, 2017 12:29 pm

Using HISAT2 and BowTie2 on Dr. Roy's Data

Postby efrain.gonzalez0 » Fri Oct 20, 2017 11:58 am

Hello everyone,

I am posting some of the results of the HISAT2 and Bowtie2 analysis on Professor Roy's Data. I will also be comparing particular outputs for certain genes to see where the differences lie in the alignment results for these genes. I will be doing it by using the HISAT2 commands that I have posted as well as some Bowtie2 commands that I include in this post. In order to get the actual genes I will be using htseq-count on both the HISAT2 and Bowtie2 files. In this post I will also discuss the initial problem that I faced when using Bowtie2 with htseq-count and how to resolve it.

  1. The command to run Bowtie2 is similar to HISAT2. All of the code was run within the Bowtie2 directory. The code looked like this:
    Code: Select all
    ./bowtie2 -x ./Bowtie2Index/genome -U ~/Sample-A-Input_S43_L007_R1_001.fastq -S BowTie_Sample_A_Input2.sam

  2. The htseq-count code that I used was the same as when I used it for the HISAT2 files. After obtaining the htseq-count outputs for both the Bowtie2 and HISAT2 files we wanted to be able to compare the outputs amongst certain genes (NRF1, APOE, etc.). I was able to compare them by using the following commands:
    1. This command allows you to extract information on a certain gene from any htseq-count sam output file and put that information into a new file:
      Code: Select all
      grep GENE_NAME htseq_count_output.sam > gene_info.txt

      Ex:
      Code: Select all
      grep APOE COUNTDrRoy_Bowtie.sam > DrRoy_APOE_Bowtie.txt
    2. This command allows you to sort the file above in ascending numerical order based on the starting position of a particular sequence (this information is usually found in column four of the txt file) and output that information into a new file:
      Code: Select all
      sort -t$'\t' -k4,4g gene_info.txt > sorted_gene_info.txt

      Ex:
      Code: Select all
      sort -t$'\t' -k4,4g DrRoy_APOE_Bowtie.txt > sort_DrRoy_APOE_Bowtie.txt
  3. The problem: htseq-count was giving 0's as the counts for each gene.
    The why: It seems as though this is often a problem of using an index in Bowtie2/HISAT2 that does not match with the reference that you are using in htseq-count.
    The solution: Make sure that the index you are using in HISAT2/Bowtie2 is by the same people whose reference you will be using in htseq-count.
    Ex: In my case I was using an index that was by NCBI for my Bowtie2 analysis but then my reference for the htseq-count was by Ensembl. After changing my index to one created by Ensembl I no longer obtained only zeros in my htseq-count output. You can obtain several different indexes from Illumina's iGenome collection located at the following url: https://support.illumina.com/sequencing/sequencing_software/igenome.html
    Here are some references that may help you resolve this issue in the future:
    1. https://galaxyproject.org/support/chrom-identifiers/
    2. https://www.biostars.org/p/220756/
    3. https://biostar.usegalaxy.org/p/19790/
Attachments
Roy_Chip_Data_HISAT2.ods
(13.15 KiB) Downloaded 180 times
efrain.gonzalez0
 
Posts: 138
Joined: Tue May 02, 2017 12:29 pm

Next

Return to Others

Who is online

Users browsing this forum: No registered users and 2 guests

cron