Alternative splicing from RNA-seq data#
Miniprotocol Timing#
Timing <2 hours
Overview#
Several other modules should be run before generating splicing data to prepare the data. These include:
molecular_phenotypes/calling/RNA_calling.ipynb
(step i): Generate data quality summary with fastqcmolecular_phenotypes/calling/RNA_calling.ipynb
(step ii): Trim adaptorsmolecular_phenotypes/calling/RNA_calling.ipynb
(step iii): Align RNASeq reads with STAR using the wasp option specifically for splicing data
This miniprotocol shows the use of modules for splicing quantification and normalization. They are as follows:
molecular_phenotypes/calling/splicing_calling.ipynb
(step i): Quantify splicing with leafcutter or psichomicsmolecular_phenotypes/QC/splicing_normalization.ipynb
(step ii): Quality control and normalization of splicing datadata_preprocessing/phenotype/gene_annotation.ipynb
(step iii): Process splicing data for use in TensorQTL
Steps#
i. Splicing Quantification#
a. LeafCutter#
Intron usage ratio quantification via leafCutter
#
input
: a meta data file contains locations of all Aligned.sortedByCoord.out.bam files to be analyzed.output
: a file with intron usage ratios, end with “_intron_usage_perind.counts.gz”
sos run pipeline/splicing_calling.ipynb leafcutter \
--cwd output/leaf_cutter/ \
--samples output/rnaseq/xqtl_protocol_data_bam_list \
--container containers/leafcutter.sif
b. Psichomics#
Percent Spliced In (PSI) quantification for alternative splicing events via Psichomics
#
input
: a meta data file contains locations of all SJ.out.tab files to be analyzed.output
: psi_raw_data.tsv, contains percent spliced in values for each alternative splicing event
sos run pipeline/splicing_calling.ipynb psichomics \
--cwd output/psichomics/ \
--samples output/rnaseq/xqtl_protocol_data_bam_list \
--splicing_annotation hg38_suppa.rds \
--container containers/psichomics.sif
ii. Splicing QC and Normalization#
a. Leafcutter#
QC and Normalization of leafCutter outputs#
input
: the “_intron_usage_perind.counts.gz” file from previous stepoutput
: QC’d and normalized phenotype table end with “qqnorm.txt” Be noted that theratio
file to be fed into the leafcutter_norm are the one withoutnumber
tag in its filename.
sos run pipeline/splicing_normalization.ipynb leafcutter_norm \
--cwd output/leaf_cutter/ \
--ratios output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz \
--container containers/leafcutter.sif
b. Psichomics#
QC and Normalization of psichomics outputs#
input
: the “psi_raw_data.tsv” file from previous stepoutput
: QC’d and normalized phenotype table end with “qqnorm.txt”
sos run pipeline/splicing_normalization.ipynb psichomics_norm \
--cwd psichomics_output \
--ratios psichomics_output/psi_raw_data.tsv \
--container containers/psichomics.sif
iii. Post Processing for TensorQTL#
a. Leafcutter#
Post-process of leafcutter outputs for them to be TensorQTL ready#
input
: output of the previous two steps and the gtf file.output
: a file in bed format end with “formated.bed.gz”
sos run pipeline/gene_annotation.ipynb annotate_leafcutter_isoforms \
--cwd output/leaf_cutter/ \
--intron_count output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind_numers.counts.gz \
--phenoFile output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz_raw_data.qqnorm.txt \
--annotation-gtf reference_data/Homo_sapiens.GRCh38.103.chr.reformatted.collapse_only.gene.gtf \
--container containers/bioinfo.sif \
--sample_participant_lookup reference_data/sample_participant_lookup.rnaseq
b. Psichomics#
Post-process of psichomics outputs for them to be TensorQTL ready#
input
: the “qqnorm.txt” output from the previous step and the gtf file.output
: a file in bed format end with “formated.bed.gz”
sos run pipeline/code/data_preprocessing/phenotype/gene_annotation.ipynb annotate_psichomics_isoforms \
--cwd psichomics_output \
--phenoFile psichomics_output/psichomics_raw_data_bedded.qqnorm.txt \
--annotation-gtf reference_data/Homo_sapiens.GRCh38.103.chr.reformated.ERCC.gene.gtf \
--container containers/bioinfo.sif
Anticipated Results#
The final output contains the QCed and normalized splicing data from leafcutter and psichomics.