QTL Association Analysis#

This notebook contains the workflow to perform QTL association analysis.

Miniprotocol Timing#

This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.

Timing < X minutes

Overview#

This workflow is an application of the QTL association analysis from the xQTL project pipeline.

  1. TensorQTL.ipynb (step i, ii): run cis-QTL and trans-QTL analyses

Input#

  • output/genotype_by_chrom/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt: Generated from genotype_preprocessing

  • output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt: Generated from phenotype_preprocessing

  • output/covariate/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.unrelated.plink_qc.prune.pca.Marchenko_PC.gz: Generated from covariates_preprocessing

  • prototype_example/protocol_example/protocol_example.protein.enhanced_cis_chr21_chr22.bed: this is TAD-B list generated based on the TADB list TADB_enhanced_cis.bed to handle protein data. The code to generate it can be found in create_protocol_example_data. Please be noted that, all molecular_trait_id in the phenotype data are suppose to have a customized cis window corresponding to it.

Output#

  • Empirical cis results: /mnt/vast/hpc/csg/molecular_phenotype_calling/pQTL_cis/rosmap

  • Standardized cis results: /mnt/vast/hpc/csg/molecular_phenotype_calling/pQTL_cis/rosmap_stad/pQTL.#

Steps#

i. Cis TensorQTL Command#

sos run xqtl-protocol/pipeline/TensorQTL.ipynb cis \
    --genotype-file output/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \
    --phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \
    --covariate-file output/covariate/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-cis-windows xqtl_association/protocol_example.protein.enhanced_cis_chr21_chr22.bed \
    --cwd output/cis_association/ \
    --MAC 5 --numThreads 2 -J 22 \
    --container oras://ghcr.io/cumc/tensorqtl_apptainer:latest

ii. Trans TensorQTL Command#

Some protein is not in the customized cis windows list. There we will need to remove them from the analysis by create a region_list. Noted that the region list need to be a actual file. So <() file is not acceptable.

zcat output/phenotype/protocol_example.protein.bed.gz | cut -f 1,2,3,4 | grep -v -e ENSG00000163554 \
    -e ENSG00000171564 -e ENSG00000171560 -e ENSG00000171557 > output/protocol_example.protein.region_list

It take more than 180G of mem to run the following commands.

sos run xqtl-protocol/pipeline/TensorQTL.ipynb trans \
    --genotype-file output/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \
    --phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \
    --region-list output/phenotype/protocol_example.protein.region_list \
    --covariate-file output/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.unrelated.plink_qc.prune.pca.Marchenko_PC.gz \
    --customized-cis-windows xqtl_association/protocol_example.protein.enhanced_cis_chr21_chr22.bed \
    --cwd output/trans_association/ \
    --MAC 5 --numThreads 2 -J 22 \
    --container oras://ghcr.io/cumc/tensorqtl_apptainer:latest

Anticipated Results#

TensorQTL will produce empirical and standardized cis/trans results.