Working from a data directory where processing occurs on a sample-by-sample basis, this script links together based on the conditions, reading from a condition-sample-input relational table and creates symbolic links processed data for downstream condition dependent processing such as replicate aware peak calling.

D0_combine_by_condition.sh

Arguments

DAT

data directory containing subdirectories of batches of" samples, e.g. data > batch1 > sample1, sample2, etc."

TABLE

tab-delimited table specifying the CONDITION_ID, SAMPLE_ID," and INPUT_ID as columns, and a fourth TYPE column for punctate" vs. broad peak-calling, with any additional columns being" ignored. The first line (header) will be ignored."

OUTDIR

output directory to place condition based data repository." Will create this directory if it does not exist."

Details

Note that input samples are placed in a controls directory thus losing the sample:input relationship, so keep track of sample:input relationship (using the database supplied) when running peak callers.

Note that samples not having an input must have a value of "NA" to prevent errors.

Ideally, the database doesn't have a header, but if it does, simply delete the bad directory that is created using the first column's value.

This script requires the following:

  • tab separated database specifying condition, sample, and input (NA if none)

Examples


#' 0. Read in Parameters
DAT=$(readlink -f $1)
TAB=$(readlink -f $2)
OUTDAT=$(readlink -f $3)

#' 0. Echo parameters back
echo ""
echo "Beginning to process your data table.."
echo ""

#' 1. Part 1.
mkdir -p $OUTDAT

#' Read in table, and sort based on first column
IFS=$'\n'      # change "Internal Field Separator" to make for-loop split on newlines
for LINE in $(cat $TAB); do 
    #' Split line into variables
    CONDITIONID=$(echo $LINE | awk '{print $1}')
    SAMPLEID=$(echo $LINE | awk '{print $2}')
    INPUTID=$(echo $LINE | awk '{print $3}')

    echo "Writing condition: $CONDITIONID"
    echo ".. Sample: $SAMPLEID"

    #' Make all relevant condition and sample specific dirs
    mkdir -p $OUTDAT/$CONDITIONID
    mkdir -p $OUTDAT/$CONDITIONID/samples

    #' Grab sample absolute paths
    cd $OUTDAT/$CONDITIONID/samples
    SPATH=$(ls -ld $DAT/*/* | awk '{print $9}' | grep $SAMPLEID)
    ln -s $SPATH .    

    #' Process input id if not NA
    if [[ ! $INPUTID -ne "NA" ]]; then
	echo ".. Input: $INPUTID"
	mkdir -p $OUTDAT/$CONDITIONID/inputs
	cd $OUTDAT/$CONDITIONID/inputs
	IPATH=$(ls -ld $DAT/*/* | awk '{print $9}' | grep $INPUTID)
	ln -s $IPATH .
    fi 
done

echo ""
echo "Condition-based repository written."
echo ""