Non-Hispanic White Linkage Disequilibrium Reference Panel

Non-Hispanic White Linkage Disequilibrium Reference Panel #

LD matrices calculated from whole genome sequencing data from 16571 non-Hispanic white individuals obtained from the Genome Center for Alzheimer’s Disease (GCAD). Correlation matrices were calculated between SNPs within 1361 LD blocks which were obtained from this Github page (generated from 1000 Genomes EUR samples).

Contact #

Oluwatosin Olayinka

Output Format #

Each LD block contains two files of interest:

  • an xz-compressed file containing the correlation values, suffixed by .cor.xz
  • this file is a compressed file where the matrix is encoded in a space-separated format
  • the data is stored in the upper triangle of the matrix
  • a Plink .bim file suffixed by .cor.xz.bim containing unique IDs for each variant
ld_block_ref_file = "/path/to/matrix.cor.xz"
var_names = read.table(paste0(ld_block_ref_file, ".bim"), header = F)$V2
ld <- scan(xzfile(ld_block_ref_file))
ld <- matrix(ld, ncol = sqrt(length(ld)), byrow = TRUE)
ld <- ld + t(ld)
diag(ld) = 1
rownames(ld) = var_names
colnames(ld) = var_names

Data Availability #

The generated files can be found on Synapse.

  1. Generating LD Reference Panel: https://github.com/cumc/xqtl-pipeline/blob/main/code/reference_data/ld_reference_generation.ipynb