Retrieve sample directories and arrange, the following organization:

A0_get_samples.sh

Arguments

FILE

Sample database with pointers to data locations, tab or space delimited first field must be the path to top-level directory containing sample folders

PATH

in which to create the sample directories; default is current directory

Details

Samples will be arranged as follows:

  • Sample_X

    • fastq

      • .fastq.gz files

Raw fastq files on the server from which data are acquired should be organized such that it resembles the following:

  • [directory]

    • Sample_XX

    • Sample_YY

      • yy1_R1.fastq.gz

      • ..

      • yyN_R1.fastq.gz

In the _RQ (requisition) version of A0.get_samples.sh, the files are organized such that the fastq files live in a subfolder (Unaligned) beneath each individual sample folder.

Examples



shopt -s nullglob

## Process args
FILE=$1
LOC=$2
if [ ${#LOC} -eq 0 ]; then
    LOC=$(readlink -f .)
fi

## Process and get the overarching, unique directories containing multiple samples
YCGA_ALL=$(awk '{FS = "\t"} {print $1}' $FILE | sort | uniq)

## Create directories for each sample; place links into appropriate sample folder with uniq name
cd $LOC
echo ""
for I in $YCGA_ALL; do # I=each panfs location
    echo "Writing files in $(basename $I) .."
    for J in $I/*; do # J=each sample folder in panfs
	echo ".. Writing $(basename $J)"
	od=$(basename $J)
	if [ ! -d $od ]; then # create sample folder in workdir and the fastq dir while we're at it.
	    mkdir -p $od/fastq
	fi
	fc=$(awk -F, 'NR==2{print $1}' $J/SampleSheet.csv) # use the flowcell to distinguish fastqs that would have the same name.
	for K in $J/*.fastq*; do # K=each fastq within panfs/sample/
	    ln -s -T $K $od/fastq/${fc}_$(basename $K)
##	    mv $K $od/fastq/${fc}_$(basename $K)
	done 
    done
done

echo "Samples acquired."
echo ""