Illustration of xQTL protocol#

This notebook illustrates the computational protocols available from this repository for the detection and analysis of molecular QTLs (xQTLs). A minimal toy data-set consisting of 49 de-identified samples are used for the analysis.

Analysis#

Please visit the homepage of the protocol website for the general background on this resource, in particular the How to use the resource section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, please conduct your analysis in the order listed below, each link contains a mini-protocol for a specific task. All commands documented in each mini-protocol should be executed in the command line environment.

Molecular Phenotype Quantification#

  1. Reference data munging & QC

  2. Quantification of gene expression

  3. Quantification of alternative splicing events

  4. Quantification of DNA methylation

Data Pre-Processing#

  1. Genotype data munging & QC

  2. Phenotype data munging & QC

  3. Covariates data munging & QC

QTL Association Analysis#

  1. QTL association testing

  2. QTL association postprocessing

Integrative Analysis#

  1. FIXME

Multi-omics data integration#

To be updated

Data#

For record keeping: preparation of the demo dataset is documented on this page — this is a private repository accessible to FunGen-xQTL analysis working group members.

For protocols listed in this page, downloaded required input data in Synapse.

  • To be able downloading the data, first create user account on Synapse Login. Username and password will be required when downloading

  • Downloading required installing of Synapse API Clients, type pip install synapseclient in terminal or Command Prompt to install the Python package. Details list on this page.

  • Each folder in different level has unique Synapse ID, which allowing you to download only some folders or files within the entire folder.

To download the test data for section “Bulk RNA-seq molecular phenotype quantification”, please use the following Python codes,

import synapseclient 
import synapseutils 
syn = synapseclient.Synapse()
syn.login("your username on synapse.org","your password on synapse.org")
files = synapseutils.syncFromSynapse(syn, 'syn53174239', path="./")

To download the test data for section “xQTL association analysis”, please use the following Python codes,

import synapseclient 
import synapseutils 
syn = synapseclient.Synapse()
syn.login("your username on synapse.org","your password on synapse.org")
files = synapseutils.syncFromSynapse(syn, 'syn52369482', path="./")

Software environment: use Singularity containers#

Analysis documented on this website are best performed using containers we provide either through singularity (recommended) or docker, via the --container option pointing to a container image file. For example, --container oras://ghcr.io/cumc/tensorqtl_apptainer:latest uses a singularity image to perform analysis for QTL association mapping via software TensorQTL. If you drop the --container option then you will rely on software installed on your computer to perform the analysis.

Troubleshooting#

If you run into errors relating to R libraries while including the --container option then you may need to unload your R packages locally before running the sos commands. For example, this error:

Error in dyn.load(file, DLLpath = DLLPath, ...):
unable to load shared object '$PATH/R/x86_64-pc-linux-gnu-library/4.2/stringi/libs/stringi.so':
libicui18n.so.63: cannot open shared object file: No such file or directory

May be fixed by running this before the sos commands are run:

export R_LIBS=""
export R_LIBS_USER=""

Analyses on High Performance Computing clusters#

The protocol example shown above performs analysis on a desktop workstation, as a demonstration. Typically the analyses should be performed on HPC cluster environments. This can be achieved via SoS Remote Tasks on configured host computers. We provide this toy example for running SoS pipeline on a typical HPC cluster environment. First time users are encouraged to try it out in order to help setting up the computational environment necessary to run the analysis in this protocol.