Univariate Fine-Mapping and TWAS with SuSiE#
Description#
Our pipeline is capable of performing univariate fine-mapping with SuSiE with TWAS weights. The TWAS portion makes use of TWAS weights, linkage disequilibrium data and GWAS summary statistics. Preset variants used are taken from the linkage disequilibrium data and used only for TWAS. TWAS cross validation tell us which of the four methods (enet, lasso, mrash, SuSiE) are best to use. By default, we limit to under 5000 variants for cross validation. In cross validation, the data is split into five parts. Training is done on four parts, and prediction is done on the fifth. Linear regression is used to assess the results and get r squared and pvalues.
Fine mapping with SuSiE follows the formulat y=xb+e where x has many highly correlated variables due to linkage disequilibrium. Therefore, true effects (b), are very sparse. The SuSiE wrapper looks for five independent signals in each region to increase convergence speed. However, if five signals are found, then the the upper limit is increased. SuSiE does not allow for the inclusion of covariates. Therefore, covariates are regressed in.
Input:#
--genoFile
: path to a plink bed file containing genotypes.
--phenoFile
: at least one tab delimited file containing chr, start, end, ID and path for the regions. For example:
#chr start end ID path
chr12 389272 389273 ENSG00000120647 $PATH/Ast.log2cpm.bed.gz
chr12 389319 389320 ENSG00000073614 $PATH/Ast.log2cpm.bed.gz
--covFile
: at least one tab delimited file containing covariates in the rows and samples in the columns.
--customized-association-windows
: a tab delimited file containing chr, start, end, and gene_id. For example:
#chr start end gene_id
chr1 0 6480000 ENSG00000008128
chr1 0 6480000 ENSG00000008130
----phenotype-names
: the names of the phenotypes if multiple are included. There should be one for each phenotype file you include.
--max-cv-variants
: maximum number of variants for cross-validation.
--ld_reference_meta_file
: path to file containing chrom, start, end and path for linkage disequilibrium region information. For example:
#chrom start end path
chr1 101384274 104443097 $PATH/chr1_101384274_104443097.cor.xz.bim
chr1 104443097 106225286 $PATH/chr1_104443097_106225286.cor.xz.bim
--region-name
: if you only wish to analyze one region, then include the gene_id of a region found in the customized-association-windows
file
Minimal Working Example Steps#
i. Run the Fine-Mapping and TWAS with SuSiE#
sos run $PATH/protocol/pipeline/mnm_regression.ipynb susie_twas \
--name ROSMAP_mega_eQTL \
--genoFile $PATH/genofile/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.11.bed \
--phenoFile $PATH/phenofile/Mic/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.Mic.mega.normalized.log2cpm.region_list.txt \
$PATH/phenofile/Ast/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.region_list.txt \
$PATH/phenofile/Oli/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.Oli.mega.normalized.log2cpm.region_list.txt \
$PATH/phenofile/OPC/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.OPC.mega.normalized.log2cpm.region_list.txt \
$PATH/phenofile/Exc/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.Exc.mega.normalized.log2cpm.region_list.txt \
$PATH/phenofile/Inh/analysis_ready/phenotype_preprocessing/snuc_pseudo_bulk.Inh.mega.normalized.log2cpm.region_list.txt \
--covFile $PATH/phenofile/Mic/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.Mic.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
$PATH/phenofile/Ast/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
$PATH/phenofile/Oli/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.Oli.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
$PATH/phenofile/OPC/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.OPC.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
$PATH/phenofile/Exc/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.Exc.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
$PATH/phenofile/Inh/analysis_ready/covariate_preprocessing/snuc_pseudo_bulk.Inh.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \
--customized-association-windows $PATH/windows/TADB_enhanced_cis.coding.bed \
--phenotype-names Mic_mega_eQTL Ast_mega_eQTL Oli_mega_eQTL OPC_mega_eQTL Exc_mega_eQTL Inh_mega_eQTL \
--max-cv-variants 5000 --ld_reference_meta_file $PATH/ldref/ld_meta_file.tsv \
--region-name ENSG00000073921 \
--save-data \
--cwd $PATH/output/ -s build
Anticipated Results#
Univariate finemapping will produce a file containing results for the top hits and a file containing twas weights produced by susie.
ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_bvrs.rds
:
For each gene of interest and phenotype, this file contains:
susie_fitted
variant_names
analysis_script
other quantities - information on dropped samples, if any
summary statistics (beta and se) for each variant
sample names
summary statistics for top loci
susie results trimmed - includes pip, sets, cs_corr, sets_secondary, alpha, lbf_variable, V, niter, X_column_scale_factors, mu, mu2
total time elapsed
region info
preset_variants_results
ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_data.rds
(from the –save-data argument):
For each gene of interest, contains residuals for each sample and phenotype
see pecotmr code for description at
load_regional_association_data
function
ROSMAP_mega_eQTL.chr11_ENSG00000073921.univariate_twas_weights.rds
:
For each gene of interest and phenotype, this file contains:
twas_weights - weights for enet, lasso, bayes r, mrash and susie methods
twas_predictions - twas predictions for enet, lasso, bayes r, mrash and susie methods
twas_cv_result
total_time_elapsed
variant_names
region_info