Mixture Multivariate Distribution Estimate#
The following performs computes a prior independent of the specific analysis method chosen for the data. This foundational step enables the application of various techniques, such as UDR, ED, TED, and initialization with FLASHier, among others. The goal here is to establish a mixture model to extract meaningful signals from the data.
An earlier version of the approach is outlined in Urbut et al 2019. This workflow implements a few improvements including using additional EBMF methods as well as the new udr (Ultimate deconvolution in R) package to fit the mixture model.
After priors are calculated, the model is fit and posteriors are calculated for variables of interest, with the objective being conducting a multivariate analysis under the MASH model. The Multivariate adaptive shrinkage (MASH) analysis has improved upon the Urbut et al 2019 paper.
Input:
--data
: rds file. For example, str(mwe.rds)
:
List of 10
$ random.z: num [1:184, 1:6] 1.527 -0.282 0.365 -1.5 0.548 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ strong.z: num [1:46, 1:6] -0.47 1.601 -1.351 -0.298 1.114 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ null.z : num [1:184, 1:6] -0.0966 -0.5293 0.854 -0.5985 -0.0601 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ random.b: num [1:184, 1:6] 0.1161 -0.0232 0.0248 -0.1286 0.0417 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ strong.b: num [1:46, 1:6] -0.0788 0.0824 -0.0904 -0.0548 0.1429 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ null.b : num [1:184, 1:6] -0.01337 -0.0818 0.09428 -0.04075 -0.00255 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ null.s : num [1:184, 1:6] 0.1384 0.1545 0.1104 0.0681 0.0424 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ random.s: num [1:184, 1:6] 0.076 0.0822 0.068 0.0857 0.0762 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ strong.s: num [1:46, 1:6] 0.1676 0.0515 0.0669 0.1837 0.1283 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
$ XtX : num [1:6, 1:6] 577.7 196.9 165.4 22.5 313.8 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
.. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
--vhat
: “identity”, “simple”, “mle”, “vhat_corshrink_xcondition”, or “vhat_simple_specific”
--cwd
: output path
--vhat-data
: for mash_fit.ipynb mash, vhat data in an rds file from the mixture_prior.ipynb step
--prior-data
: for mash_fit.ipynb mash, prior data in an rds file from the mixture_prior.ipynb step
--compute-posterior
: for mash_fit.ipynb mash, if the posterior probability should be calculated
Overview#
Compute MASH prior
MASH fit
Generate Plots
Steps#
i. Compute MASH prior#
sos run $PATH/mixture_prior.ipynb ed_bovy \
--output_prefix MWE_ed_bovy \
--data $PATH/MWE.rds \
--cwd $PATH/output/ --vhat mle
ii. MASH fit#
sos run $PATH/mash_fit.ipynb mash \
--output-prefix MWE_ed_bovy_posterior \
--data $PATH/MWE.rds \
--vhat-data $PATH/MWE_ed_bovy.EZ.V_simple.rds \
--prior-data $PATH/MWE_ed_bovy.EZ.prior.rds \
--compute-posterior \
--cwd $PATH/output/ \
iii. Generate Plots#
sos run $PATH/mixture_prior.ipynb plot_U \
--output-prefix protocol_example.mixture_plots \
--data $PATH/MWE_ed_bovy.EZ.prior.rds \
--cwd $PATH/output/ \
Anticipated Results#
i. Compute MASH prior
MWE_ed_bovy.EZ.prior.rds
: rds file containing U, w and loglik.
MWE_ed_bovy.EZ.V_simple.rds
: an NxN matrix
MWE_ed_bovy.canonical.rds
: rds file containing multiple NxN matrices
MWE_ed_bovy.flash.model.rds
: rds file containing a model and factors
MWE_ed_bovy.flash.rds
: rds file containing three matrices:
tFLASH_default
FLASH_default_1
FLASH_default_2
MWE_ed_bovy.flash_nonneg.model.rds
: rds file containing a model and factors
MWE_ed_bovy.flash_nonneg.rds
: rds file containing three matrices:
tFLASH_nonneg
FLASH_nonneg_1
FLASH_nonneg_2
MWE_ed_bovy.pca.rds
: rds file containing three matrices:
PCA_1
PCA_2
tPCA
ii. MASH fit
MWE_ed_bovy_posterior.EZ.mash_model.rds
:rds file contaning mash_model, vhat_file path and prior_file path. mash_model contains:
result
- containsPosteriorMean
,PosteriorSD
,NegativeProb
,lfsr
andPosteriorCov
loglik
vloglik
null_loglik
alt_loglik
fitted_g
- containspi
,Ulist
,grid
andusepointmass
posterior_weights
alpha
lm
- containsloglik_matrix
andlfactors
MWE_ed_bovy_posterior.EZ.posterior.rds
: rds file containint PosteriorMean, PosteriorSD, lfdr, NegativeProb and lfsr
iii. Generate Plots
MWE_ed_bovy.EZ.prior.pdf
: file containing heatmap plots
MWE_ed_bovy.EZ.prior.png