Software: ProtHint
Description
ProtHint is a pipeline for predicting and scoring hints (in the form of introns, start and stop codons) in the genome of interest by mapping and spliced aligning predicted genes to a database of reference protein sequences.
Available Versions
Unspecified: | 2.6.0 |
---|
Commands - cds_with_upstream_support.py - combine_gff_records.pl - count_cds_overlaps.py - flag_top_proteins.py - gff_from_region_to_contig.pl - make_chains.py - nucseq_for_selected_genes.pl - print_high_confidence.py - print_longest_isoform.py - proteins_from_gtf.pl - prothint.py - prothint2augustus.py - run_spliced_alignment.pl - run_spliced_alignment_pbs.pl - select_best_proteins.py - select_for_next_iteration.py - spalnBatch.sh - spaln_to_gff.py
Academic license
ProtHint depends on GenMark. To use GeneMark, users need to download license files by yourself.
Go to the GeneMark web site: http://exon.gatech.edu/GeneMark/license_download.cgi. Check the boxes for GeneMark-ES/ET/EP ver 4.69_lic
and LINUX 64
next to it, fill out the form, then click "I agree". In the next page, right click and copy the link addresses for 64 bit
licenss. Paste the link addresses in the commands below:
cd $HOME
wget "replace with license URL"
zcat gm_key_64.gz > .gm_key
Module
You can load the modules by:
module load biocontainers
module load prothint
Example job
Using
#!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use#!/bin/bash
instead.
To run ProtHint on our cluster:
#!/bin/bash
#SBATCH -A myallocation # Allocation name
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --job-name=prothint
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml biocontainers prothint
prothint.py --threads 4 input/genome.fasta input/proteins.fasta --geneSeeds input/genemark.gtf --workdir test