QTL Association Analysis#
This notebook contains the workflow to perform QTL association analysis.
Miniprotocol Timing#
This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.
Timing < X minutes
Overview#
This workflow is an application of the QTL association analysis from the xQTL project pipeline.
TensorQTL.ipynb
(step i, ii): run cis-QTL and trans-QTL analyses
Input#
output/genotype_by_chrom/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt
: Generated from genotype_preprocessingoutput/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt
: Generated from phenotype_preprocessingoutput/covariate/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.unrelated.plink_qc.prune.pca.Marchenko_PC.gz
: Generated from covariates_preprocessingprototype_example/protocol_example/protocol_example.protein.enhanced_cis_chr21_chr22.bed
: this is TAD-B list generated based on the TADB listTADB_enhanced_cis.bed
to handle protein data. The code to generate it can be found in create_protocol_example_data. Please be noted that, all molecular_trait_id in the phenotype data are suppose to have a customized cis window corresponding to it.
Output#
Empirical cis results: /mnt/vast/hpc/csg/molecular_phenotype_calling/pQTL_cis/rosmap
Standardized cis results: /mnt/vast/hpc/csg/molecular_phenotype_calling/pQTL_cis/rosmap_stad/pQTL.#
Steps#
i. Cis TensorQTL Command#
sos run xqtl-protocol/pipeline/TensorQTL.ipynb cis \
--genotype-file output/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \
--phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \
--covariate-file output/covariate/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.plink_qc.prune.pca.Marchenko_PC.gz \
--customized-cis-windows xqtl_association/protocol_example.protein.enhanced_cis_chr21_chr22.bed \
--cwd output/cis_association/ \
--MAC 5 --numThreads 2 -J 22 \
--container oras://ghcr.io/cumc/tensorqtl_apptainer:latest
ii. Trans TensorQTL Command#
Some protein is not in the customized cis windows list. There we will need to remove them from the analysis by create a region_list. Noted that the region list need to be a actual file. So <()
file is not acceptable.
zcat output/phenotype/protocol_example.protein.bed.gz | cut -f 1,2,3,4 | grep -v -e ENSG00000163554 \
-e ENSG00000171564 -e ENSG00000171560 -e ENSG00000171557 > output/protocol_example.protein.region_list
It take more than 180G of mem to run the following commands.
sos run xqtl-protocol/pipeline/TensorQTL.ipynb trans \
--genotype-file output/protocol_example.genotype.chr21_22.genotype_by_chrom_files.txt \
--phenotype-file output/phenotype_by_chrom/protocol_example.protein.bed.phenotype_by_chrom_files.txt \
--region-list output/phenotype/protocol_example.protein.region_list \
--covariate-file output/protocol_example.protein.protocol_example.samples.protocol_example.genotype.chr21_22.pQTL.unrelated.plink_qc.prune.pca.Marchenko_PC.gz \
--customized-cis-windows xqtl_association/protocol_example.protein.enhanced_cis_chr21_chr22.bed \
--cwd output/trans_association/ \
--MAC 5 --numThreads 2 -J 22 \
--container oras://ghcr.io/cumc/tensorqtl_apptainer:latest
Anticipated Results#
TensorQTL will produce empirical and standardized cis/trans results.