This script simply finalizes the processing aspect by adding a leading "chr" for compatibility with many tools and the UCSC Genome Browser to bam files. Works for both paired-end and single-end data.



sample directory where bam files are located in either the" shift folder for ATAC-seq or samtools folder for ChIP-seq."


allocated threads for the job; note that memory amt" proportional to number of threads / 20 * 128G (Ruddle)" which ends up being about 6GB per thread"


This script requires the following:

  • samtools 1.3.1


#' Preset parameters
MEMPERTHREAD=6                 # GB; 128 GB of RAM / 20 cores per node

#' 0. Read in Parameters
SDIR=$(readlink -f $1)
PREFIX=$(basename $SDIR)

#' 0. Echo parameters back
echo ""
echo "Beginning to process your paired/signle-end ChIP-seq/ATAC-seq data.."
echo ".. Memory per thread is: $MEMPERTHREAD GB"
echo ""
echo "Working on: $SDIR"
echo ".. Using prefix: $PREFIX"
echo ".. Using $THREADS threads"
echo ""

#' 1. Part 1.
mkdir -p $SDIR/bam_clean
cd $SDIR/bam_clean

if [ -e $SDIR/shift ]; then
    echo "Working with ATAC-seq data.."
    echo "Working with ChIP-seq data.."
echo ".. Using bam file: $(basename $DAT)"
echo ""

#' 2. Adding "chr" to bam header
echo "1. Adding 'chr' to bam header.."
samtools view -H $DAT | \
    awk '{FS = "\t"; OFS = "\t"} { gsub(/SN:/, "SN:chr"); print $0}' > tmp_header.sam

#' 3. Adding "chr" to third field (RNAME) and seventh (RMNM - mate name)
echo "2. Adding 'chr' to third field (RNAME) and seventh (RMNM - mate name).."
samtools view $DAT | awk '{FS = "\t"; OFS = "\t"} 
  { $3="chr"$3; if ($7 != "=") { $7="chr"$7 }; print $0 }' > tmp_body.sam

#' 4. Concatenating header and body together and converting to bam and indexing
echo "3. Concatenating header and body together into bam.."
cat tmp_header.sam tmp_body.sam | samtools view -b - > ${PREFIX}.bam

echo "4. Indexing.."
samtools index ${PREFIX}.bam

#' 5. Clean up..
echo "5. Clean up.."
rm tmp*

echo ""
echo "Bam cleaned up and finalized."
echo ""