xQTL-GWAS pairwise enrichment and colocalization#
This workflow processes fine-mapping results for xQTL, generated by susie_twas
in the mnm_regression.ipynb
notebook for cis xQTL, and GWAS fine-mapping results produced by susie_rss
in the rss_analysis.ipynb
notebook. It is designed to perform enrichment and colocalization analysis, particularly when fine-mapping results originate from different regions in the case of cis-xQTL and GWAS. The pipeline is capable to integrate and analyze data across these distinct regions. Originally tailored for cis-xQTL and GWAS integration, this pipeline can be applied to other pairwise integrations. An example of such application is in trans analysis, where the fine-mapped regions might be identical between trans-xQTL and GWAS, representing a special case of this broader implementation.
Input#
Lists of SuSiE fine-mapping output objects, in RDS format, of class(susie)
in R.
For xQTL the list is meta-data of format:
chr
,start
,end
,original_data
,conditions_top_loci
,block_top_loci
whereoriginal_data
is an RDS file,conditions_top_loci
is showing which contexts have top loci table (potential signals)block_top_loci
is the blocks have overlapped top loci variant with xQTL data. That file could be output fromfine_mapping_post_processing/overlap_qtl_gwas
.For GWAS the list is meta-data of format:
chr
,start
,end
,original_data
,block_top_loci...
whereoriginal_data
is an RDS file. That file could be output fromfine_mapping_post_processing/gwas_results_export
.Context meta is a metafile that shows the analysis_names and the contained contexts. It can be used to identify the corresponding raw data for each context.
Analytical Logic#
Enrichment
Identify GWAS Blocks: Select GWAS blocks with top loci from Bellenguez data and using a single variant regression method. Locate the original filenames (
original_data
) for these blocks and map the analysis regions to identify overlapping gene analysis regions and corresponding QTL files with top loci table (condition_top_loci
).Contexts in xQTL Meta-data: Find contexts within the xQTL meta-data that include top loci results for each gene. Retrieve the corresponding original xQTL files (
original_data
) by referencing the context meta-data.Execute xQTL GWAS Enrichment: Perform
xqtl_gwas_enrichment
analysis to generate enrichment parameters (a0, a1) and calculate priors (p1, p2, p12).
Coloc
xQTL Meta-data Contexts: Within the xQTL meta-data, identify contexts that include top loci results for each gene, indicated by
condition_top_loci
. Retrieve the corresponding original xQTL files (original_data
) by referring back to the context meta-data. (Optional) Subset the QTL table with specifies gene list.Original GWAS Files: Identify GWAS blocks that contain overlapping top loci variants for each gene (
block_top_loci
) in above file. Utilize the GWAS list to locate the original filenames (original_data
) for the identified GWAS block candidates.Enrichment and Coloc Analysis: Automatically execute
xqtl_gwas_enrichment
to enable enrichment analysis, generating priors for subsequent coloc analysis through logic defined inenrichment
. Then, applysusie_coloc
for coloc analysis on the selected xQTL and GWAS original files, analyzing each gene under each identified condition.
Output#
Enrichment analysis results — this is a global enrichment estimate that combines all input data for each context.
Colocalization results for regions of interest — if
ld_meta_file_path
is provided, would also report coloc cs by coloc postprocessing.
Example#
enrichment
sos run ~/codes/xqtl-pipeline/pipeline/SuSiE_enloc.ipynb xqtl_gwas_enrichment \
--gwas_meta_data /mnt/vast/hpc/csg/rf2872/Work/pecotmr/encoloc_test/gwas.block_results_db.tsv \
--xqtl_meta_data /mnt/vast/hpc/csg/rf2872/Work/Multivariate/gwas/overlap_test/ROSMAP_eQTL.overlapped.gwas.tsv \
--xqtl_finemapping_obj preset_variants_result susie_result_trimmed \
--xqtl_varname_obj preset_variants_result variant_names \
--gwas_finemapping_obj AD_Bellenguez_2022 single_effect_regression susie_result_trimmed \
--gwas_varname_obj AD_Bellenguez_2022 single_effect_regression variant_names \
--xqtl_region_obj region_info grange \
--qtl-path /mnt/vast/hpc/csg/rf2872/Work/Multivariate/susie_2024_new/rds_files \
--gwas_path /mnt/vast/hpc/csg/hs3393/RSS_QC/GWAS_finemapping_Feb22_7dataset/SuSiE_RSS \
--context_meta /mnt/vast/hpc/homes/rf2872/codes/fungen-xqtl-analysis/resource/context_meta.tsv
coloc (would trigger enrichment automatically)
sos run ~/codes/xqtl-pipeline/pipeline/SuSiE_enloc.ipynb susie_coloc \
--gwas_meta_data /mnt/vast/hpc/csg/rf2872/Work/Multivariate/susie_2024_new/demo_gwas/demo_gwas.block_results_db.tsv \
--xqtl_meta_data /mnt/vast/hpc/csg/rf2872/Work/Multivariate/susie_2024_new/demo_overlap/demo_overlap.overlapped.gwas.tsv \
--xqtl_finemapping_obj preset_variants_result susie_result_trimmed \
--xqtl_varname_obj preset_variants_result variant_names \
--gwas_finemapping_obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed susie_result_trimmed \
--gwas_varname_obj AD_Bellenguez_2022 RSS_QC_RAISS_imputed variant_names \
--xqtl_region_obj region_info grange \
--qtl-path /mnt/vast/hpc/homes/rf2872/aws/rds_files \
--gwas_path /home/hs3393/RSS_QC/GWAS_finemapping_Apr9/univariate_rss \
--context_meta /mnt/vast/hpc/homes/rf2872/codes/xqtl-analysis/resource/context_meta.tsv \
--ld_meta_file_path /mnt/vast/hpc/csg/data_public/20240120_ADSP_LD_matrix/ld_meta_file.tsv \
--cwd demo_coloc
eg:
gwas_meta_data
output fromfine_mapping_post_processing/gwas_results_export
#chr start end region_id TSS original_data combined_data combined_data_sumstats conditions conditions_top_loci
chr1 101384274 104443097 1_101384274-104443097 NA 1_101384274-104443097.susie_rss.rds gwas.1_101384274-104443097.cis_results_db.export.rds gwas.1_101384274-104443097.cis_results_db.export_sumstats.rds NA NA
chr19 44935906 46842901 19_44935906-46842901 NA 19_44935906-46842901.susie_rss.rds gwas.19_44935906-46842901.cis_results_db.export.rds gwas.19_44935906-46842901.cis_results_db.export_sumstats.rds AD_Bellenguez_2022,AD_Jansen_2021,AD_Kunkle_Stage1_2019,AD_Wightman_Full_2021,AD_Wightman_Excluding23andMe_2021,AD_Wightman_ExcludingUKBand23andME_2021 AD_Wightman_ExcludingUKBand23andME_2021.qc_impute,AD_Wightman_ExcludingUKBand23andME_2021.qc_only AD_Wightman_ExcludingUKBand23andME_2021.qc_impute
eg:
xqtl_meta_data
for enrichment: output from fine_mapping_post_processing/cis_results_export
; for coloc: output from fine_mapping_post_processing/overlap_qtl_gwas
, the difference between them is without or with last 2 columns: block_top_loci
, final_combined_data
to document overlapped block info at variant level
#chr start end region_id TSS original_data combined_data combined_data_sumstats conditions conditions_top_loci block_top_loci final_combined_data
chr1 167600000 171480000 ENSG00000000457 169894266 MSBB_eQTL.ENSG00000000457.univariate_susie_twas_weights.rds, ROSMAP_Kellis_eQTL.ENSG00000000457.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000000457.cis_results_db.export.rds Fungen_xQTL.ENSG00000000457.cis_results_db.export_sumstats.rds BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Ast_Kellis_eQTL BM_22_MSBB_eQTL,BM_44_MSBB_eQTL
chr19 41840000 47960000 ENSG00000130203 44905790 ROSMAP_Kellis_eQTL.ENSG00000130203.univariate_susie_twas_weights.rds, ROSMAP_Bennett_Klein_pQTL.ENSG00000130203.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000130203.cis_results_db.export.rds Fungen_xQTL.ENSG00000130203.cis_results_db.export_sumstats.rds Mic_Kellis_eQTL,DLPFC_Bennett_pQTL Mic_Kellis_eQTL 19_44935906-46842901 Fungen_xQTL.ENSG00000130203.cis_results_db.export.overlapped.gwas.rds
chr20 49934867 53560000 ENSG00000000419 50958554 MSBB_eQTL.ENSG00000000419.univariate_susie_twas_weights.rds Fungen_xQTL.ENSG00000000419.cis_results_db.export.rds Fungen_xQTL.ENSG00000000419.cis_results_db.export_sumstats.rds BM_10_MSBB_eQTL,BM_22_MSBB_eQTL BM_22_MSBB_eQTL Fungen_xQTL.ENSG00000000419.cis_results_db.export.overlapped.gwas.rds
eg:
context_meta
should be updated as more analysis is done, we used analysis_name and context in this analysis
analysis_name cohort context
KNIGHT_pQTL KNIGHT Knight_pQTL_brain,Knight_eQTL_brain
MSBB_eQTL MSBB BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL
MIGA_eQTL MIGA MiGA_GTS_eQTL,MiGA_SVZ_eQTL,MiGA_THA_eQTL
[global]
# Workdir
parameter: cwd = path("output")
# A list of file paths for fine-mapped GWAS results.
parameter: gwas_meta_data = path
# A list of file paths for fine-mapped xQTL results.
parameter: xqtl_meta_data = path
# Optional: if a region list is provide the enrichment analysis will be focused on provided region.
# The LAST column of this list will contain the ID of regions to focus on
parameter: region_list = path()
# Optional: if a region name is provided
# the analysis would be focused on the union of provides region list and region names
parameter: region_name = []
# It is required to input the name of the analysis
parameter: name = f"{xqtl_meta_data:bnn}.{gwas_meta_data:bnn}"
parameter: container = ""
import re
parameter: entrypoint= ('micromamba run -a "" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\.sif)$', '', container.split('/')[-1])) if container else ""
# For cluster jobs, number commands to run per job
parameter: job_size = 200
# Wall clock time expected
parameter: walltime = "5m"
# Memory expected: quite large for enrichment analysis but small for xQTL colocalization
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 1
#Optional table name in xQTL RDS files to get fine mapping results (eg 'susie_result_trimmed ').
parameter: xqtl_finemapping_obj = []
#Optional table name in xQTL RDS files to get variable names (eg ' variant_names ').
parameter: xqtl_varname_obj = []
#Optional table name in GWAS RDS files to get fine mapping results (eg 'AD_Bellenguez_2022 single_effect_regression susie_result_trimmed ').
parameter: gwas_finemapping_obj = []
#Optional table name in GWAS RDS files to get variable names(eg 'AD_Bellenguez_2022 single_effect_regression variant_names ').
parameter: gwas_varname_obj = []
#Optional table name in xQTL RDS files to get region info (eg 'susie_result_trimmed region_info grange ').
parameter: xqtl_region_obj = []
#Optional table name in GWAS RDS files to get region info (eg 'AD_Bellenguez_2022 single_effect_regression region_info grange ').
parameter: gwas_region_obj = []
#Directory path for GWAS orignal finemapping results
parameter: gwas_path = ''
#Directory path for xQTL orignal finemapping results
parameter: qtl_path = ''
# a meta file showing the context and corresponding analysis_name
parameter: context_meta = path()
# Optional: if a region list is provide the analysis will be focused on provided region.
# The LAST column of this list will contain the ID of regions to focus on
# Otherwise, all regions with both genotype and phenotype files will be analyzed
parameter: region_list = path()
# Optional: if a region name is provided
# the analysis would be focused on the union of provides region list and region names
parameter: region_name = []
# use conditions_top_loci_minp column or not, which can reduce a lot computing resource by pick top 1 isoform
parameter: minp = False
import os
import pandas as pd
import re
def group_by_region(lst, partition):
# from itertools import accumulate
# partition = [len(x) for x in partition]
# Compute the cumulative sums once
# cumsum_vector = list(accumulate(partition))
# Use slicing based on the cumulative sums
# return [lst[(cumsum_vector[i-1] if i > 0 else 0):cumsum_vector[i]] for i in range(len(partition))]
return partition
def make_unique_data(data):
return ','.join(set(data.split(',')))
def generate_meta_dataframe(meta_data_path):
"""Generate a new long metadata by converting to context:analysis_name (1:1)."""
meta = pd.read_csv(meta_data_path, sep='\t')
new_meta = pd.DataFrame()
contexts = pd.unique(meta['context'].str.split(',', expand=True).stack().str.strip())
for context in contexts:
mask = meta['context'].str.contains(context)
tmp = pd.DataFrame({
'context': [context],
'analysis_name': [','.join(meta.loc[mask, 'analysis_name'])]
})
new_meta = pd.concat([new_meta, tmp], ignore_index=True)
return new_meta
def generate_condition_based_dataframe(xqtl_df, minp):
"""Generate a new long table by converting to condition and region:original_data (1:1)."""
# only consider the QTL contexts that have top loci data
condition_column = 'conditions_top_loci_minp' if minp else 'conditions_top_loci'
conditions = pd.unique(xqtl_df[condition_column].str.split(',', expand=True).stack().str.strip())
new_df = pd.DataFrame()
for condition in conditions:
escaped_condition = re.escape(condition)
mask = xqtl_df[condition_column].str.contains(escaped_condition, case=False, regex=True)
# Iterate through each unique region in the filtered data
unique_regions = pd.unique(xqtl_df.loc[mask, 'region_id'])
for region_id in unique_regions:
region_mask = (xqtl_df['region_id'] == region_id) & mask
tmp = pd.DataFrame({
'condition': [condition],
'region_id': [region_id],
'QTL_original_data': [','.join(xqtl_df.loc[region_mask, 'original_data'])],
'GWAS_original_data': [','.join(xqtl_df.loc[region_mask, 'block_data'])]
})
new_df = pd.concat([new_df, tmp], ignore_index=True)
return new_df
def merge_and_filter_dfs(new_df, new_meta):
"""Merge and filter the dataframes based on condition/context where context is a substring of condition, and analysis_name to pick the corresponding original files."""
# Explode the QTL data into separate rows
new_df = new_df.set_index(['condition', 'region_id', 'GWAS_original_data'])['QTL_original_data'].str.split(',', expand=True).stack().reset_index(name='QTL_original_data').drop('level_3', axis=1)
new_df['analysis_name_prefix'] = new_df['QTL_original_data'].apply(lambda x: x.split('.')[0])
# Create a custom merge logic to match context as a substring of condition
def custom_merge(row, df_meta):
# Filter meta dataframe to find any context that is part of the condition string
sub_df = df_meta[df_meta['context'].apply(lambda x: x in row['condition'])]
if not sub_df.empty:
return sub_df
else:
return pd.DataFrame(columns=df_meta.columns) # Return an empty DataFrame with the same columns if no match found
# Apply custom merge function row-wise and concatenate results
results = [custom_merge(row, new_meta) for index, row in new_df.iterrows()]
merged_df = pd.concat(results, keys=new_df.index).reset_index(level=1, drop=True).join(new_df, how='outer')
# Filter rows where analysis_name_prefix matches analysis_name exactly
filtered_df = merged_df[merged_df['analysis_name_prefix'].str.strip() == merged_df['analysis_name'].str.strip()]
# Drop unnecessary columns and duplicates
filtered_df = filtered_df.drop(columns=['analysis_name_prefix', 'context']).drop_duplicates()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
return filtered_df
def prepare_final_paths(filtered_df, qtl_path, gwas_path):
"""Prepare final paths for QTL and GWAS original data."""
filtered_df['QTL_original_data'] = filtered_df['QTL_original_data'].apply(
lambda x: ','.join(f"{qtl_path}/{file_name.strip()}" for file_name in x.split(','))
)
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(
lambda x: ','.join(f"{gwas_path}/{file_name.strip()}" for file_name in x.split(','))
)
return filtered_df
# get overlapped regions by gene and block region for enrichment analysis
[get_analysis_regions: shared = "regional_data"]
import pandas as pd
def process_dataframes_with_toploci(xqtl_meta_data, gwas_meta_data):
"""Process the XQTL and GWAS dataframes."""
xqtl_df = pd.read_csv(xqtl_meta_data, sep="\t")
# filter xQTL data with the ones have overlapped top loci variants with GWAS data
xqtl_df = xqtl_df[xqtl_df['conditions_top_loci_minp' if minp else 'conditions_top_loci'].notna()]
gwas_df = pd.read_csv(gwas_meta_data, sep="\t")
gwas_df = gwas_df[gwas_df['conditions_top_loci'].notna()]
# e.g. gwas_finemapping_obj = ['AD_Bellenguez_2022', 'single_effect_regression', 'susie_result_trimmed']
# filter gwas data with find variants in specified cohort 'AD_Bellenguez_2022' and method 'single_effect_regression'
gwas_finemapping_obj_filtered = [obj for obj in gwas_finemapping_obj if obj != 'susie_result_trimmed']
if len(gwas_finemapping_obj_filtered) > 0:
pattern = '|'.join(gwas_finemapping_obj_filtered)
gwas_df = gwas_df[gwas_df['conditions_top_loci'].str.contains(pattern)]
xqtl_df = overlapped_analysis_region(xqtl_df, gwas_df)
return xqtl_df, gwas_df
def overlapped_analysis_region(xqtl_df, gwas_df):
# Create an empty dataframe to store overlapping xQTL records
overlapped_xqtl_df = pd.DataFrame()
# Iterate over each row in the GWAS dataframe
for index, gwas_row in gwas_df.iterrows():
gwas_chr = gwas_row['#chr']
gwas_start = gwas_row['start']
gwas_end = gwas_row['end']
gwas_block_id = gwas_row['original_data'] # Assuming each GWAS row has a unique block ID
# Filter xQTL dataframe for the same chromosome
same_chr_xqtl_df = xqtl_df[xqtl_df['#chr'] == gwas_chr]
# Find overlapping xQTL regions with the current GWAS region
overlapping_xqtl = same_chr_xqtl_df[((same_chr_xqtl_df['start'] <= gwas_end) &
(same_chr_xqtl_df['end'] >= gwas_start))].copy() # Add .copy() to avoid SettingWithCopyWarning
# Check if there are any overlapping xQTLs
if not overlapping_xqtl.empty:
# Use .loc to safely assign 'block_data' to avoid the SettingWithCopyWarning
overlapping_xqtl.loc[:, 'block_data'] = gwas_block_id
# Append the overlapping xQTLs to the accumulated dataframe
overlapped_xqtl_df = pd.concat([overlapped_xqtl_df, overlapping_xqtl], ignore_index=True)
# Group by xQTL region and concatenate 'block_data'
overlapped_xqtl_df['block_data'] = overlapped_xqtl_df.groupby(['#chr', 'start', 'end'])['block_data'].transform(lambda x: ','.join(x))
overlapped_xqtl_df = overlapped_xqtl_df.drop_duplicates().reset_index(drop=True)
return overlapped_xqtl_df
xqtl_df, gwas_df = process_dataframes_with_toploci(xqtl_meta_data, gwas_meta_data)
region_ids=[]
# If region_list is provided, read the file and extract IDs
if not region_list.is_dir():
if region_list.is_file():
region_list_df = pd.read_csv(region_list, sep='\t', header=None, comment = "#")
region_ids = region_list_df.iloc[:, -1].unique() # Extracting the last column for IDs
else:
raise ValueError("The region_list path provided is not a file.")
if len(region_name) > 0:
region_ids = list(set(region_ids).union(set(region_name)))
if len(region_ids) > 0:
xqtl_df = xqtl_df[xqtl_df['region_id'].isin(region_ids)]
new_df = generate_condition_based_dataframe(xqtl_df, minp)
# if there is only one original data file, we don't have to map to context meta
if (new_df['QTL_original_data'].str.split(',').apply(lambda x: len(x) in [0, 1]).all() and not context_meta.is_file()):
filtered_df = new_df.copy()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
else:
filtered_df = merge_and_filter_dfs(new_df, generate_meta_dataframe(context_meta))
filtered_df = prepare_final_paths(filtered_df, qtl_path, gwas_path)
filtered_df = filtered_df.groupby('condition').agg({
'GWAS_original_data': lambda x: ','.join(x),
'QTL_original_data': lambda x: ','.join(x)
}).reset_index()
regional_data = {
'data': [row['QTL_original_data'].split(',') for _, row in filtered_df.iterrows()],
'conditions': [(f"{row['condition']}", *row['GWAS_original_data'].split(',')) for _, row in filtered_df.iterrows()]
}
# get overlapped regions from flatten table, to get the regions with overlapped variants and select those regions for coloc analysis
[get_overlapped_analysis_regions: shared = "overlapped_regional_data"]
import pandas as pd
def map_blocks_to_data(blocks, mapping_dict):
"""Map blocks to data using a provided mapping dictionary."""
mapped_data = ','.join(mapping_dict.get(block.strip(), 'NA') for block in blocks.split(','))
return mapped_data
def process_dataframes(xqtl_meta_data, gwas_meta_data):
"""Process the XQTL and GWAS dataframes."""
xqtl_df = pd.read_csv(xqtl_meta_data, sep="\t")
# filter xQTL data with the ones have overlapped top loci variants with GWAS data
xqtl_df = xqtl_df[xqtl_df['block_top_loci'].notna()]
gwas_df = pd.read_csv(gwas_meta_data, sep="\t")
gwas_df = gwas_df[gwas_df['conditions_top_loci'].notna()]
# e.g. gwas_finemapping_obj = ['AD_Bellenguez_2022', 'single_effect_regression', 'RSS_QC_RAISS_imputed']
# filter gwas data with find variants in specified cohort 'AD_Bellenguez_2022' and method 'RSS_QC_RAISS_imputed'
gwas_finemapping_obj_filtered = [obj for obj in gwas_finemapping_obj if obj != 'susie_result_trimmed' and obj != 'RSS_QC_RAISS_imputed']
if len(gwas_finemapping_obj_filtered) > 0:
pattern = '|'.join(gwas_finemapping_obj_filtered)
gwas_df = gwas_df[gwas_df['conditions_top_loci'].str.contains(pattern)]
gwas_df = gwas_df[gwas_df['region_id'].isin(xqtl_df['block_top_loci'].str.split(',').explode().unique())]
region_to_combined_data = dict(zip(gwas_df['region_id'], gwas_df['original_data']))
# and only consider the overlapped GWAS blocks
xqtl_df['block_data'] = xqtl_df['block_top_loci'].apply(map_blocks_to_data, args=(region_to_combined_data,))
xqtl_df['block_data'] = xqtl_df['block_data'].str.replace(',NA', '').str.replace('NA,', '')
return xqtl_df, gwas_df
xqtl_df, gwas_df = process_dataframes(xqtl_meta_data, gwas_meta_data)
region_ids=[]
# If region_list is provided, read the file and extract IDs
if not region_list.is_dir():
if region_list.is_file():
region_list_df = pd.read_csv(region_list, sep='\t', header=None, comment = "#")
region_ids = region_list_df.iloc[:, -1].unique() # Extracting the last column for IDs
else:
raise ValueError("The region_list path provided is not a file.")
if len(region_name) > 0:
region_ids = list(set(region_ids).union(set(region_name)))
if len(region_ids) > 0:
xqtl_df = xqtl_df[xqtl_df['region_id'].isin(region_ids)]
new_df = generate_condition_based_dataframe(xqtl_df, minp)
# if there is only one original data file, we don't have to map to context meta
if (new_df['QTL_original_data'].str.split(',').apply(lambda x: len(x) in [0, 1]).all() and not context_meta.is_file()):
filtered_df = new_df.copy()
filtered_df['GWAS_original_data'] = filtered_df['GWAS_original_data'].apply(make_unique_data)
else:
filtered_df = merge_and_filter_dfs(new_df, generate_meta_dataframe(context_meta))
filtered_df = prepare_final_paths(filtered_df, qtl_path, gwas_path)
overlapped_regional_data = {
'data': [row['QTL_original_data'].split(',') for _, row in filtered_df.iterrows()],
'conditions': [(f"{row['condition']}@{row['region_id']}", *row['GWAS_original_data'].split(',')) for _, row in filtered_df.iterrows()]
}
[xqtl_gwas_enrichment]
depends: sos_variable("regional_data")
stop_if(len(regional_data['data']) == 0, f'No files left for analysis')
meta = regional_data['conditions']
input: regional_data["data"], group_by = lambda x: group_by_region(x, regional_data["data"]), group_with = "meta"
output: f'{cwd:a}/{name}.{_meta[0]}.enrichment.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: expand = '${ }', stdout = f"{_output:n}.stdout", stderr = f"{_output:n}.stderr", container = container, entrypoint = entrypoint
library(tidyverse)
library(pecotmr)
# RDS files for GWAS data
gwas_finemapped_data = c(${paths([x for x in _meta[1:len(_meta)]]):r,})
# RDS files for xQTL data
xqtl_finemapped_data = c(${paths([x for x in _input]):r,})
enrich_result = xqtl_enrichment_wrapper(gwas_files = gwas_finemapped_data, xqtl_files = xqtl_finemapped_data,
xqtl_finemapping_obj = c("${_meta[0]}",${",".join(['"%s"' % x for x in xqtl_finemapping_obj]) if len(xqtl_finemapping_obj) != 0 else "NULL"}),
xqtl_varname_obj = c("${_meta[0]}",${",".join(['"%s"' % x for x in xqtl_varname_obj]) if len(xqtl_varname_obj) != 0 else "NULL"}),
gwas_finemapping_obj = c(${",".join(['"%s"' % x for x in gwas_finemapping_obj]) if len(gwas_finemapping_obj) != 0 else "NULL"}),
gwas_varname_obj = c(${",".join(['"%s"' % x for x in gwas_varname_obj]) if len(gwas_varname_obj) != 0 else "NULL"}))
saveRDS(enrich_result, ${_output:ar})
[susie_coloc]
depends: sos_variable("overlapped_regional_data")
# depends: sos_step("xqtl_gwas_enrichment") #changed
stop_if(len(overlapped_regional_data['data']) == 0, f'No files left for analysis')
# skip enrichment or not
parameter: skip_enrich=False
# filter lbf by cs as default in susie coloc. default is False means filtering by V > prior_tol
parameter: filter_lbf_cs = False
# ld reference path for postprocessing
parameter: ld_meta_file_path = path()
meta = overlapped_regional_data['conditions']
input: overlapped_regional_data["data"], group_by = lambda x: group_by_region(x, overlapped_regional_data["data"]), group_with = "meta"
# output: f'{cwd:a}/{step_name[:-2]}/{name}.{_meta[0]}.coloc.rds'
output: f'{cwd:a}/{step_name}/{name}.{_meta[0]}.coloc.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: expand = '${ }', stdout = f"{_output:n}.stdout", stderr = f"{_output:n}.stderr", container = container, entrypoint = entrypoint
library(tidyverse)
library(pecotmr)
library(coloc)
# RDS files for xQTL data
# RDS files for xQTL data
xqtl_finemapped_datas = c(${paths([x for x in _input]):r,})
coloc_res <- list()
# are we doing something like this? that means results are by each context, and each one has
for(xqtl_finemapped_data in xqtl_finemapped_datas){
qtl_dat <- readRDS(xqtl_finemapped_data)
gene = names(qtl_dat)
context = "${_meta[0]}" %>% str_split(.,"@",simplify = T) %>% .[,1]
gene_region = pecotmr:::get_nested_element(qtl_dat[[1]], c(context,"region_info", "grange"))
# Step 1: find relevant GWAS regions that overlap each the xQTL region of interest
gwas_finemapped_datas <- c(${ ",".join(['"%s"' % x for x in _meta[1:]])})
gwas_finemapped_datas <- gwas_finemapped_datas[grep('rds$',gwas_finemapped_datas)]
gwas_regions <- gwas_finemapped_datas %>% basename %>% str_extract(., "chr\\d+_\\d+_\\d+")
overlap_index <- NULL
for (i in 1:length(gwas_regions)) {
region <- gwas_regions[i]
split_region <- unlist(strsplit(region, "_"))
block_chrom <- as.numeric(split_region[1] %>% gsub("chr","",.))
block_start <- as.numeric(split_region[2] %>% strsplit(., "-") %>% unlist %>% .[1])
block_end <- as.numeric(split_region[2] %>% strsplit(., "-") %>% unlist %>% .[2])
if (gene_region$chrom == block_chrom && (gene_region$start <= block_end | gene_region$end >= block_start)) {
overlap_index <- c(overlap_index, i)
}
}
if (!is.null(overlap_index)) {
gwas_finemapped_data <- gwas_finemapped_datas[overlap_index]
# Step 2: load enrichment analysis results
# Extract values for p1, p2, and p12
if(${"FALSE" if skip_enrich else "TRUE"}){
enrich_file = paste0('${cwd:a}','/', '${name}', '.' ,context,'.enrichment.rds')
p1 <- readRDS(enrich_file)[[1]]$`Alternative (coloc) p1`
p2 <- readRDS(enrich_file)[[1]]$`Alternative (coloc) p2`
p12 <- readRDS(enrich_file)[[1]]$`Alternative (coloc) p12`
message("Priors are P1:", p1, "; p2: ", p2, "; p12: ", p12)
} else {
p1 = 1e-4
p2 = 1e-4
p12 = 5e-6
message("Priors are using default, P1: 1e-4, P2: 1e-4, P12: 5e-6" )
}
# Step 3: Apply colocalization analysis between each condition and GWAS
coloc_res[[gene]] <- coloc_wrapper(xqtl_file = xqtl_finemapped_data, gwas_files = gwas_finemapped_data,
xqtl_finemapping_obj = c(context,${",".join(['"%s"' % x for x in xqtl_finemapping_obj]) if len(xqtl_finemapping_obj) != 0 else "NULL"}),
xqtl_varname_obj = c(context,${",".join(['"%s"' % x for x in xqtl_varname_obj]) if len(xqtl_varname_obj) != 0 else "NULL"}),
gwas_finemapping_obj = c(${",".join(['"%s"' % x for x in gwas_finemapping_obj]) if len(gwas_finemapping_obj) != 0 else "NULL"}),
gwas_varname_obj = c(${",".join(['"%s"' % x for x in gwas_varname_obj]) if len(gwas_varname_obj) != 0 else "NULL"}),
xqtl_region_obj = c(context,${",".join(['"%s"' % x for x in xqtl_region_obj]) if len(xqtl_region_obj) != 0 else "NULL"}),
gwas_region_obj = c(${",".join(['"%s"' % x for x in gwas_region_obj]) if len(gwas_region_obj) != 0 else "NULL"}),
p1 = p1, p2 = p2, p12 = p12, filter_lbf_cs = ${"FALSE" if skip_enrich else "TRUE"})
if (${"TRUE" if ld_meta_file_path.is_file() else "FALSE"}) {
if(length(coloc_res[[gene]]) > 2) {
coloc_res[[gene]] <- coloc_post_processor(coloc_res[[gene]], LD_meta_file_path = ${ld_meta_file_path:r}, analysis_region = coloc_res[[gene]]$analysis_region)
if(!is.null(coloc_res[[gene]]$sets$cs)) writeLines(coloc_res[[gene]]$sets$cs %>% unlist, gsub("rds$","coloc_res","${_output}"))
} else { message("No coloc results were generated.") }
}
} else {
message("No overlap was found between GWAS blocks and QTL analysis region.")
coloc_res <- "No overlap was found between GWAS blocks and QTL analysis region."
}
}
saveRDS(coloc_res, ${_output:r})