Skip to main content

transdecoder

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

  • TransDecoder identifies likely coding sequences based on the following criteria:
  • a minimum length open reading frame (ORF) is found in a transcript sequence
  • a log-likelihood score similar to what is computed by the GeneID software is > 0.
  • the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 2 forward reading frames.
  • if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).
  • a PSSM is built/trained/used to refine the start codon prediction.
  • optional the putative peptide has a match to a Pfam domain above the noise cutoff score.

Detailed usage can be found here: https://github.com/TransDecoder/TransDecoder/wiki#running-transdecoder

Link to section 'Versions' of 'transdecoder' Versions

  • 5.5.0

Link to section 'Commands' of 'transdecoder' Commands

  • TransDecoder.LongOrfs
  • TransDecoder.Predict
  • cdna_alignment_orf_to_genome_orf.pl
  • compute_base_probs.pl
  • exclude_similar_proteins.pl
  • fasta_prot_checker.pl
  • ffindex_resume.pl
  • gene_list_to_gff.pl
  • get_FL_accs.pl
  • get_longest_ORF_per_transcript.pl
  • get_top_longest_fasta_entries.pl
  • gff3_file_to_bed.pl
  • gff3_file_to_proteins.pl
  • gff3_gene_to_gtf_format.pl
  • gtf_genome_to_cdna_fasta.pl
  • gtf_to_alignment_gff3.pl
  • gtf_to_bed.pl
  • nr_ORFs_gff3.pl
  • pfam_runner.pl
  • refine_gff3_group_iso_strip_utrs.pl
  • refine_hexamer_scores.pl
  • remove_eclipsed_ORFs.pl
  • score_CDS_likelihood_all_6_frames.pl
  • select_best_ORFs_per_transcript.pl
  • seq_n_baseprobs_to_loglikelihood_vals.pl
  • start_codon_refinement.pl
  • train_start_PWM.pl
  • uri_unescape.pl

Link to section 'Module' of 'transdecoder' Module

You can load the modules by:

module load biocontainers
module load transdecoder

Link to section 'Example job' of 'transdecoder' Example job

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run transdecoder on our our clusters:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name 
#SBATCH -t 20:00:00
#SBATCH -N 1
#SBATCH -n 24
#SBATCH --job-name=transdecoder
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers transdecoder

gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta 
gtf_to_alignment_gff3.pl transcripts.gtf > transcripts.gff3
TransDecoder.LongOrfs -t transcripts.fasta
TransDecoder.Predict -t transcripts.fasta
Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.