Genotype VCF File Quality Control#
This implements some recommendations from UK Biobank on sequence data quality control.
Description#
A major challenge in biomedical research is the quality control (QC) of sequencing data. False positive variant calls can hinder the ability to detect disease associated variants or introduce spurious associations, therefore the need for a rigorous QC. Our pipeline focuses on QC after the variant calling stage and requires project Variant Calling Format (pVCF) as input files. We have defined default thresholds for genotype and variant-level hard filtering based on recommendations from the UK Biobank team and a thorough review of the literature [cf. Carson et al. BMC Bioinformatics (2014),cf. Lek et al. Nature (2016),cf. Szustakowski et al. Nature Genetics (2021)]. Bcftools is used in our QC steps. We first handle multi-allelic sites by splitting them into bi-allelic records. We include an optional workflow to keep only bi-allelic sites in the data. Variants are then annotated based on dbSNP data. Genotypes are kept if they have a Genotype Depth (DP) >= 10 and a Genotype Quality (GQ) >= 20. Variants are included if at least one sample has an allelic balance (AB) >= 0.15 for Single Nucleotide Variants (SNVs) and AB>=0.2 for indels, variant missigness is below 20% and Hardy-Weinberg Equilibrium p-value is > 1e-08. Allele balance is calculated for heterozygotes as the number of bases supporting the least-represented allele over the total number of base observations. Output summary statistics, such as transistion/transversion ratios (TS/TV ratio) are calculated to determine the effectiveness of QC.
Default Parameters: VCF QC Filters#
Genotype depth filters: For WES data, UK Biobank recommends SNPs DP>10 and Indels DP>10 for indels. However we think for WGS we can be less stringent, or simply rely on GQ. Users can set it to 1 eg,
--DP 1 --DP-indel 1
Genotype quality GQ>20.
At least one sample per site passed the allele balance threshold >= 0.15 for SNPs and >=0.20 for indels (heterozygous variants).
Allele balance is calculated for heterozygotes as the number of bases supporting the least-represented allele over the total number of base observations.
This module also allows for filtering by HWE and missingness although by default we don’t filter on them.
Filtering are done with bcftools
. Here is a useful cheatsheet from github user @elowy01.
A note on TS/TV summary from VCF genotype data#
bcftools stats
command provides useful summary statistics including TS/TV ratio, which is routinely used as a quality measure of variant calls. With dbSNP based annotation of novel and known variants, bcftools
can compute TS/TV for novel and known variants at variant level, and at sample level. It should be noted that variant level TS/TV does not take sample genotype into consideration – it simply counts the TS and TV event for observed SNPs in the data. Other tools, such as snpsift
, implements variant level TS/TV by counting TS and TV events in sample genotypes and compute the ratio after summing up TS and TV across all samples. See here some discussions on this issue. We provide these TS/TV calculations before and after QC but users should be aware of the difference when interpreting the results.
Default Parameters: Resource Usage#
Memory: Usually a whole genome VCF.gz file has the size of 200+GB, after testing, a minimum of 60GB of mem is requried.
Walltimes: For every hour qc_1 or qc_2 can process ~14G of data. The default is set to be 24h, corresponding to ~300GB of input. Please set the –walltime parameter according to the size of your input files.
Input#
The target
vcf.gz
fileIf its chromosome name does not have the
chr
prefix and you need it to match with referencefasta
file, please runrename_chrs
workflow to addchr
.The vcf.gz file needs to be compressed by bgzip, instead of simple gzip
It should have a index file accompanying it. The index file can be generated by tabix
It must be a valid vcf.gz file that can pass bcftools sanity check: i.e. all tags are defined properly
It must contains following fields:
##FORMAT=<ID=DP,Number=1,Type=Integer,Description=”Approximate read depth (reads with MQ=255 or with bad mates are filtered)”>
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=”Genotype Quality”>
##FORMAT=<ID=AB,Number=1,Type=Float,Description=”Allele balance for each het genotype”>
dbSNP database in
VCF
formatA reference sequence
fasta
file
Output#
QC-ed genotype data in VCF format. You can use
vcf_to_plink
implemented ingenotype_formatting.ipynb
to further convert it to PLINK format.A set of sumstats to help evaluate quality of genotype before and after QC
Particularly useful is the TS/TV ratio
Minimal Working Example#
The MWE is generated via
bcftools query -l get-dosage.ALL.vcf.gz | head -40 > MWE_sample_list
bcftools view -S MWE_sample_list get-dosage.ALL.vcf.gz > sample_filtered.vcf &
bgzip -c sample_filtered.vcf > sample_filtered.vcf.gz
tabix -p vcf sample_filtered.vcf.gz
bcftools view --regions chr1 sample_filtered.vcf.gz > chr1_sample_filtered.vcf &
cat chr1_sample_filtered.vcf | head -20000 > MWE_genotype.vcf
and was stored here: https://drive.google.com/file/d/1sxxPdPIyKma0mAl8TKwhgyRHlOh0Oyrc/view?usp=sharing
FIXME: point this to the synapse folder.
(Optional) Rename Chromosomes#
Timing: 40 min
sos run VCF_QC.ipynb rename_chrs \
--genoFile /restricted/projectnb/xqtl/xqtl_protocol/reference_data/00-All.vcf.gz \
--cwd /restricted/projectnb/xqtl/xqtl_protocol/reference_data \
--container oras://ghcr.io/cumc/bioinfo_apptainer:latest \
-c /restricted/projectnb/xqtl/xqtl_protocol/scripts/csg.yml -q neurology
INFO: Running [32mrename_chrs[0m:
INFO: t0bb13a5cc2cae479 [32msubmitted[0m to neurology with job id Your job 8313589 ("job_t0bb13a5cc2cae479") has been submitted
INFO: Waiting for the completion of [32m1[0m task.
INFO: [32mrename_chrs[0m output: [32m/restricted/projectnb/xqtl/xqtl_protocol/reference_data/00-All.add_chr.vcf.gz[0m
INFO: Workflow rename_chrs (ID=w4f0deb3b5b4ae8ce) is executed successfully with 1 completed step and 1 completed task.
(Optional) dbSNP Annotation#
Timing: 4 min
sos run VCF_QC.ipynb dbsnp_annotate \
--genoFile /restricted/projectnb/xqtl/xqtl_protocol/reference_data/00-All.add_chr.vcf.gz \
--cwd /restricted/projectnb/xqtl/xqtl_protocol/reference_data \
--container oras://ghcr.io/cumc/bioinfo_apptainer:latest \
-c /restricted/projectnb/xqtl/xqtl_protocol/scripts/csg.yml -q neurology
INFO: Running [32mdbsnp_annotate[0m:
INFO: t6a0e22783b2666a3 [32msubmitted[0m to neurology with job id Your job 8314421 ("job_t6a0e22783b2666a3") has been submitted
INFO: Waiting for the completion of [32m1[0m task.
INFO: [32mdbsnp_annotate[0m output: [32m/restricted/projectnb/xqtl/xqtl_protocol/reference_data/00-All.add_chr.variants.gz[0m
INFO: Workflow dbsnp_annotate (ID=wf0ebf0af2d3d81ad) is executed successfully with 1 completed step and 1 completed task.
i. Quality Control#
Timing: X min
sos run VCF_QC.ipynb qc \
--genoFile data/MWE/MWE_genotype.vcf \
--dbsnp-variants data/reference_data/00-All.add_chr.variants.gz \
--reference-genome data/reference_data/GRCh38_full_analysis_set_plus_decoy_hla.noALT_noHLA_noDecoy_ERCC.fasta \
--cwd MWE/output/genotype_1 \
--container oras://ghcr.io/cumc/bioinfo_apptainer:latest
To run in parallel for all genotype data listed in mwe_genotype_list
:
sos run VCF_QC.ipynb qc \
--genoFile data/mwe/mwe_genotype_list \
--dbsnp-variants data/reference_data/00-All.add_chr.variants.gz \
--reference-genome data/reference_data/GRCh38_full_analysis_set_plus_decoy_hla.noALT_noHLA_noDecoy_ERCC.fasta \
--cwd MWE/output/genotype_4 \
--container oras://ghcr.io/cumc/bioinfo_apptainer:latest
Producing the following results:
Total TS/TV for 19639 known variants before QC: 2.599
Total TS/TV for 19573 known variants after QC: 2.600
There is no novel variants included in the MWE.
The Total TS/TV is extracted from the last step of QC. For known variant before QC:
grep Ts/Tv MWE_genotype.leftnorm.known_variant.snipsift_tstv | rev | cut -d',' -f1 | rev
2.599
For known variant after QC:
grep Ts/Tv MWE_genotype.leftnorm.filtered.*_variant.snipsift_tstv | rev | cut -d',' -f1 | rev
2.600
For novel variant before/after QC, TS/TV is not avaible since no novel_variants presented in the MWE
grep Ts/Tv MWE_genotype.leftnorm.novel_variant.snipsift_tstv | rev | cut -d',' -f1 | rev
grep Ts/Tv MWE_genotype.leftnorm.filtered.novel_variant.snipsift_tstv | rev | cut -d',' -f1 | rev
Command Interface#
sos run VCF_QC.ipynb -h
usage: sos run VCF_QC.ipynb [workflow_name | -t targets] [options] [workflow_options]
workflow_name: Single or combined workflows defined in this script
targets: One or more targets to generate
options: Single-hyphen sos parameters (see "sos run -h" for details)
workflow_options: Double-hyphen workflow-specific parameters
Workflows:
rename_chrs
dbsnp_annotate
qc
Global Workflow Options:
--genoFile paths
input can either be 1 vcf genoFile, or a list of vcf
genoFile.
--remove-samples . (as path)
The path to the file that contains the list of samples
to remove (format FID, IID)
--keep-samples . (as path)
The path to the file that contains the list of samples
to keep (format FID, IID)
--cwd output (as path)
Workdir
--numThreads 1 (as int)
Number of threads
--job-size 1 (as int)
For cluster jobs, number commands to run per job
--walltime 24h
Walltime
--mem 60G
Usually a whole genome VCF.gz file has the size of
200+GB, after testing, a minimum of 60GB of mem is
requried.
--container ''
Software container option
--entrypoint ('micromamba run -n' + ' ' + container.split('/')[-1][:-4] + " --no-capture-output") if container.endswith('.sif') else ""
--[no-]add-chr (default to False)
use this function to edit memory string for PLINK input
Sections
rename_chrs:
dbsnp_annotate:
qc_1: Handel multi-allelic sites, left normalization of indels
and add variant ID
Workflow Options:
--dbsnp-variants VAL (as path, required)
Path to dbSNP variants generated previously
--reference-genome VAL (as path, required)
Path to fasta file for HG reference genome, eg
GRCh38_full_analysis_set_plus_decoy_hla.fa
--[no-]bi-allelic (default to False)
--[no-]snp-only (default to False)
qc_2: genotype QC
Workflow Options:
--geno-filter 0.2 (as float)
Maximum missingess per-variant, default to 0.2
--DP-snp 10 (as int)
Sample level QC - read depth (DP) to filter out SNPs
below this value Default to 10, with WES data in mind
But for WGS, setting it to 2 may be fine considering the
WGS may have low DP but the GQ filter should be good
enough
--GQ 20 (as int)
Sample level QC - genotype quality (GQ) of specific
sample. This measure tells you how confident we are that
the genotype we assigned to a particular sample is
correct
--DP-indel 10 (as int)
Sample level QC - read depth (DP) to filter out indels
below this value
--AB-snp 0.15 (as float)
Allele balance for snps
--AB-indel 0.2 (as float)
Allele balance for indels
--hwe-filter 0.0 (as float)
HWE filter, default to 0.0 which means no HWE filter is
applied
qc_3:
Global parameters#
[global]
# input can either be 1 vcf genoFile, or a list of vcf genoFile.
parameter: genoFile = paths
# The path to the file that contains the list of samples to remove (format FID, IID)
parameter: remove_samples = path('.')
# The path to the file that contains the list of samples to keep (format FID, IID)
parameter: keep_samples = path('.')
# Workdir
parameter: cwd = path("output")
# Number of threads
parameter: numThreads = 1
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Walltime
parameter: walltime = '24h'
# Usually a whole genome VCF.gz file has the size of 200+GB, after testing, a minimum of 60GB of mem is requried.
parameter: mem = '60G'
# Software container option
parameter: container = ""
import re
parameter: entrypoint= ('micromamba run -a "" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\.sif)$', '', container.split('/')[-1])) if container else ""
# use this function to edit memory string for PLINK input
from sos.utils import expand_size
cwd = path(f"{cwd:a}")
import os
def get_genotype_file(geno_file_paths):
#
def valid_geno_file(x):
suffixes = path(x).suffixes
if suffixes[-1] == '.bed':
return True
elif suffixes[-1] == '.vcf':
return True
elif len(suffixes)>1 and ''.join(suffixes[-2:]) == ".vcf.gz" :
return True
return False
#
def complete_geno_path(x, geno_file):
if not valid_geno_file(x):
raise ValueError(f"Genotype file {x} should be VCF (end with .vcf.gz) or PLINK bed file (end with .bed)")
if not os.path.isfile(x):
# relative path
if not os.path.isfile(f'{geno_file:ad}/' + x):
raise ValueError(f"Cannot find genotype file {x}")
else:
x = f'{geno_file:ad}/' + x
return x
#
def format_chrom(chrom):
if chrom.startswith('chr'):
chrom = chrom[3:]
return chrom
# Inputs are either VCF or bed, or a vector of them
if len(geno_file_paths) > 1:
if all([valid_geno_file(x) for x in geno_file_paths]):
return paths(geno_file_paths)
else:
raise ValueError(f"Invalid input {geno_file_paths}")
# Input is one genotype file or text list of genotype files
geno_file = geno_file_paths[0]
if valid_geno_file(geno_file):
return paths(geno_file)
else:
units = [x.strip().split() for x in open(geno_file).readlines() if x.strip() and not x.strip().startswith('#')]
if all([len(x) == 1 for x in units]):
return paths([complete_geno_path(x[0], geno_file) for x in units])
elif all([len(x) == 2 for x in units]):
genos = dict([(format_chrom(x[0]), path(complete_geno_path(x[1], geno_file))) for x in units])
else:
raise ValueError(f"{geno_file} should contain one column of file names, or two columns of chrom number and corresponding file name")
return genos
genoFile = get_genotype_file(genoFile)
Annotate known and novel variants#
You can download the known variant reference from this link.
For a detailed explanation of the procedure and its rationale, please refer to this post.
[rename_chrs: provides = '{genoFile:nn}.add_chr.vcf.gz']
# This file can be downloaded from https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/VCF/00-All.vcf.gz.
input: genoFile
output: f'{_input:nn}.add_chr.vcf.gz'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: container = container, expand= "${ }", stderr = f'{_output:nn}.stderr', stdout = f'{_output:nn}.stdout', entrypoint=entrypoint
for i in {1..22} X Y MT; do echo "$i chr$i"; done > ${_output:nn}.chr_name_conv.txt
bcftools annotate --rename-chrs ${_output:nn}.chr_name_conv.txt ${_input} -Oz -o ${_output}
tabix -p vcf ${_output}
rm -f ${_output:nn}.chr_name_conv.txt
[dbsnp_annotate]
input: genoFile
output: f"{cwd}/{_input:bnn}.variants.gz"
task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: container = container, expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', entrypoint=entrypoint
# Extract specific fields from the VCF file using bcftools
bcftools query -f'%CHROM\t%POS\t%ID\t%REF\t%ALT\n' ${_input} | \
awk 'BEGIN {
OFS="\t"
}
{
# Calculate end position based on the length of REF or ALT
if (length($4) > length($5)) {
end_pos = $2 + (length($4) - 1)
} else {
end_pos = $2 + (length($5) - 1)
}
print $1, $2, end_pos, $3
}' | \
# Compress the output using bgzip
bgzip -c > ${_output}
Genotype QC#
This step handles multi-allelic sites and annotate variants to known and novel. We add an RS ID to variants in dbSNP. Variants without rsID are considered novel variants. For every hour it can produce ~14Gb of data, please set the –walltime parameter according to the size of your input files.
# Handel multi-allelic sites, left normalization of indels and add variant ID
[qc_1 (variant preprocessing)]
# Path to dbSNP variants generated previously
parameter: dbsnp_variants = path
# Path to fasta file for HG reference genome, eg GRCh38_full_analysis_set_plus_decoy_hla.fa
parameter: reference_genome = path
parameter: bi_allelic = False
parameter: snp_only = False
input: genoFile, group_by = 1
output: f'{cwd}/{_input:bnn}.{"leftnorm" if not bi_allelic else "biallelic"}{".snp" if snp_only else ""}.vcf.gz'
task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: container = container, expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', entrypoint=entrypoint
# split multiallelic sites into biallelic records
${'bcftools norm -m-any' if not bi_allelic else 'bcftools view -m2 -M2'} ${'-v snps' if snp_only else ""} ${_input} |\
# Fix incorrect or missing REF alleles and warn about them
bcftools norm -d exact -N --check-ref ws -f ${reference_genome} --threads ${numThreads} |\
# Fill missing tags
bcftools +fill-tags -- -t all,F_MISSING,'VD=sum(FMT/DP)' | \
# Remove ID and replace with CHROM:POS:REF:ALT format
bcftools annotate -x ID -I +'%CHROM:%POS:%REF:%ALT' | \
# Annotate with dbSNP rsID
bcftools annotate -a ${dbsnp_variants} -h <(echo '##INFO=<ID=RSID,Number=1,Type=String,Description="dbSNP rsID">') \
-c CHROM,FROM,TO,INFO/RSID -Oz --threads ${numThreads} -o ${_output}
bash: expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', container = container, entrypoint=entrypoint
for i in ${_output} ; do
# Capture file metadata
output_info="$i"
output_size=$(ls -lh "$i" | awk '{print $5}')
output_rows=$(zcat "$i" | wc -l)
output_column=$(zcat "$i" | grep -v "##" | head -1 | wc -w)
output_header_row=$(zcat "$i" | grep "##" | wc -l)
output_preview=$(zcat "$i" | grep -v "##" | head | cut -f 1-11)
# Write captured information to the stdout file
printf "output_info: %s\noutput_size: %s\noutput_rows: %d\noutput_column: %d\noutput_header_row: %d\noutput_preview:\n%s\n" \
"$output_info" "$output_size" "$output_rows" "$output_column" "$output_header_row" "$output_preview" >> ${_output:n}.stdout
done
This step filter variants based on FILTER PASS, DP and QC, fraction of missing genotypes (all samples), and on HWE, for snps and indels. It will also remove monomorphic sites – using bcftools view -c1
.
# genotype QC
[qc_2 (variant level QC)]
# Maximum missingess per-variant, default to 0.2
parameter: geno_filter = 0.2
# Sample level QC - read depth (DP) to filter out SNPs below this value
# Default to 10, with WES data in mind
# But for WGS, setting it to 2 may be fine considering the WGS may have low DP but the GQ filter should be good enough
parameter: DP_snp = 10
# Sample level QC - genotype quality (GQ) of specific sample. This measure tells you how confident we are that the genotype we assigned to a particular sample is correct
parameter: GQ = 20
# Sample level QC - read depth (DP) to filter out indels below this value
parameter: DP_indel = 10
# Allele balance for snps
parameter: AB_snp = 0.15
# Allele balance for indels
parameter: AB_indel = 0.2
# HWE filter, default to 0.0 which means no HWE filter is applied
parameter: hwe_filter = 0.0
output: f"{_input:nn}.bcftools_qc.vcf.gz"
task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: container = container, expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', entrypoint=entrypoint
# Initial filtering based on depth and genotype quality for SNPs and INDELs. Will set to missing genotypes that do not meet both conditions (DP>=DP_spns & GQ>=GQ), similarly for indels
bcftools filter -S . -i \
'(TYPE="SNP" & (FMT/DP)>=${DP_snp} & (FMT/GQ)>=${GQ}) |
(TYPE="INDEL" & (FMT/DP)>=${DP_indel} & (FMT/GQ)>=${GQ})' ${_input} | \
# Further filtering to retain only variants that are PASS and have at least one non-reference allele
bcftools view -c1 | \
bcftools view -f PASS | \
# Filter based on genotype (hom/het) and allelic balance for SNPs and INDELs
bcftools filter -i \
'GT="hom" |
TYPE="snp" & GT="het" & (FORMAT/AD[*:1])/(FORMAT/AD[*:0] + FORMAT/AD[*:1]) >= ${AB_snp} |
TYPE="indel" & GT="het" & (FORMAT/AD[*:1])/(FORMAT/AD[*:0] + FORMAT/AD[*:1]) >= ${AB_indel}' | \
# Filter based on missingness and Hardy-Weinberg equilibrium
bcftools filter -i 'F_MISSING<${geno_filter} & HWE>${hwe_filter}' \
-Oz --threads ${numThreads} -o ${_output}
bash: expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', container = container, entrypoint=entrypoint
for i in ${_output} ; do
# Capture file metadata
output_info="$i"
output_size=$(ls -lh "$i" | awk '{print $5}')
output_rows=$(zcat "$i" | wc -l)
output_column=$(zcat "$i" | grep -v "##" | head -1 | wc -w)
output_header_row=$(zcat "$i" | grep "##" | wc -l)
output_preview=$(zcat "$i" | grep -v "##" | head | cut -f 1-11)
# Write captured information to the stdout file
printf "output_info: %s\noutput_size: %s\noutput_rows: %d\noutput_column: %d\noutput_header_row: %d\noutput_preview:\n%s\n" \
"$output_info" "$output_size" "$output_rows" "$output_column" "$output_header_row" "$output_preview" >> ${_output:n}.stdout
done
[qc_3 (genotype data summary statistics)]
input: output_from('qc_1'), output_from('qc_2'), group_by = 1
output: f"{cwd}/{_input:bnn}.novel_variant_sumstats",
f"{cwd}/{_input:bnn}.known_variant_sumstats",
f"{cwd}/{_input:bnn}.novel_variant.snipsift_tstv",
f"{cwd}/{_input:bnn}.known_variant.snipsift_tstv"
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'
bash: container = container, expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', entrypoint=entrypoint
# Compute statistics for novel variants (RSID is missing)
bcftools stats -i 'RSID="."' -v ${_input} > ${_output[0]}
# Compute statistics for known variants (RSID is present)
bcftools stats -i 'RSID!="."' -v ${_input} > ${_output[1]}
# Compute TS/TV for novel variants
bcftools filter -i 'RSID="."' ${_input} | \
SnpSift tstv - > ${_output[2]}
# Compute TS/TV for known variants
bcftools filter -i 'RSID!="."' ${_input} | \
SnpSift tstv - > ${_output[3]}
bash: expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', container = container, entrypoint=entrypoint
for i in ${_output} ; do
# Capture file metadata
output_info="$i"
output_size=$(ls -lh "$i" | awk '{print $5}')
output_rows=$(zcat "$i" | wc -l)
output_column=$(zcat "$i" | grep -v "##" | head -1 | wc -w)
output_header_row=$(zcat "$i" | grep "##" | wc -l)
output_preview=$(zcat "$i" | grep -v "##" | head | cut -f 1-11)
# Write captured information to the stdout file
printf "output_info: %s\noutput_size: %s\noutput_rows: %d\noutput_column: %d\noutput_header_row: %d\noutput_preview:\n%s\n" \
"$output_info" "$output_size" "$output_rows" "$output_column" "$output_header_row" "$output_preview" >> ${_output[0]:n}.stdout
done