Covariate Data Preprocessing#
This notebook contains workflow of processing covariate files and computes PCA-derived covariates from phenotype data.
Miniprotocol Timing#
This represents the total duration for all miniprotocol phases. While module-specific timings are provided separately on their respective pages, they are also included in this overall estimate.
Timing < 3 minutes
Overview#
This workflow is an application of the covariate related sections from the xQTL project pipeline.
covariate_formatting.ipynb
(step i): Merge covariates and genotype PCAcovariate_hidden_factor.ipynb
(step ii): Compute residual on merged covariates and perform hidden factors analysis
Steps#
i. Merge Covariates and Genotype PCs#
You can edit the total amount of variation you want your PCs to explain by editing the --k
parameter. In this example, we chose 80%.
sos run xqtl-protocol/pipeline/covariate_formatting.ipynb merge_genotype_pc \
--cwd output/covariate \
--pcaFile output/genotype_pca/protocol_example.genotype.chr21_22.pQTL.plink_qc.prune.pca.rds \
--covFile xqtl_association/protocol_example.samples.tsv \
--tol-cov 0.4 \
--k `awk '$3 < 0.8' output/genotype_pca/protocol_example.genotype.chr21_22.pQTL.plink_qc.prune.pca.scree.txt | tail -1 | cut -f 1 ` \
--container oras://ghcr.io/cumc/bioinfo_apptainer:latest
Anticipated Results#
Processed covariate data includes a file with covariates and hidden factors for use in TensorQTL.