RCAC - Software: Meraculous

Software: Meraculous

Type

Resource

Description

Meraculous is a whole genome assembler for Next Generation Sequencing data, geared for large genomes. It is hybrid k-mer/read-based approach capitalizes on the high accuracy of Illumina sequence by eschewing an explicit error correction step which we argue to be redundant with the assembly process.

Available Versions

Available Versions
Unspecified:	2.2.6

Meraculous achieves high performance with large datasets by utilizing lightweight data structures and multi-threaded parallelization, allowing to assemble human-sized genomes on a high-cpu cluster in under a day. The process pipeline implements a highly transparent and portable model of job control and monitoring where different assembly stages can be executed and re-executed separately or in unison on a wide variety of architectures.

Home page: https://jgi.doe.gov/data-and-tools/software-tools/meraculous/

Commands

run_meraculous.sh
blastMapAnalyzer2.pl
bmaToLinks.pl
_bubbleFinder2.pl
bubblePopper.pl
bubbleScout.pl
contigBias.pl
divide_it.pl
fasta_splitter.pl
findDMin2.pl
gapDivider.pl
gapPlacer.pl
haplotyper.Naive.pl
haplotyper.pl
histogram2.pl
kmerHistAnalyzer.pl
loadBalanceMers.pl
meraculous4h.pl
meraculous.pl
N50.pl
_oNo4.pl
oNo7.pl
optimize2.pl
randomList2.pl
scaffold2contig.pl
scaffReportToFasta.pl
screen_list2.pl
spanner.pl
splinter.pl
splinter_scaffolds.pl
split_and_validate_reads.pl
test_dependencies.pl
unique.pl

Module

You can load the modules by:

module load biocontainers
module load meraculous

Example job

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run meraculous on our clusters:

#!/bin/bash
#SBATCH -A myallocation     # Allocation name
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --job-name=meraculous
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml biocontainers meraculous