{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "Author: JL Villanueva (joseluis.villanueva@crg.eu)\n", "\n", "This report describes the differential gene expression analysis comparing WT and sh samples (NUDIX5). We will use Kallisto (v 0.43.0) for quantification at the transcript level and Sleuth (v 0.30.0) for testing differential expression (DE) aggregating transcripts into genes.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initial samples\n", "
num | \n", "sample | \n", "condition | \n", "sequencing | \n", "
---|---|---|---|
1 | \n", "rw_026_01_01_rnaseq | \n", "2D_wt | \n", "06/27/2018 | \n", "
2 | \n", "rw_030_01_01_rnaseq | \n", "2D_wt | \n", "06/27/2018 | \n", "
3 | \n", "rw_031_01_01_rnaseq | \n", "2D_sh | \n", "06/27/2018 | \n", "
4 | \n", "rw_032_01_01_rnaseq | \n", "2D_sh | \n", "06/27/2018 | \n", "
5 | \n", "rw_039_01_01_rnaseq | \n", "3D_wt | \n", "07/24/2018 | \n", "
6 | \n", "rw_039_02_01_rnaseq | \n", "3D_wt | \n", "07/24/2018 | \n", "
7 | \n", "rw_040_01_01_rnaseq | \n", "3D_sh | \n", "07/24/2018 | \n", "
8 | \n", "rw_040_02_01_rnaseq | \n", "3D_sh | \n", "07/24/2018 | \n", "
X | \n", "sample | \n", "reads_mapped | \n", "reads_proc | \n", "frac_mapped | \n", "bootstraps_present | \n", "bootstraps_used | \n", "condition | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "rw_026_01_01_rnaseq | \n", "44375965 | \n", "47925975 | \n", "0.9259 | \n", "100 | \n", "100 | \n", "2D_wt | \n", "
2 | \n", "rw_030_01_01_rnaseq | \n", "55177695 | \n", "59107148 | \n", "0.9335 | \n", "100 | \n", "100 | \n", "2D_wt | \n", "
3 | \n", "rw_031_01_01_rnaseq | \n", "44576517 | \n", "48206868 | \n", "0.9247 | \n", "100 | \n", "100 | \n", "2D_sh | \n", "
4 | \n", "rw_032_01_01_rnaseq | \n", "56474041 | \n", "61169209 | \n", "0.9232 | \n", "100 | \n", "100 | \n", "2D_sh | \n", "
5 | \n", "rw_039_01_01_rnaseq | \n", "52429267 | \n", "57390135 | \n", "0.9136 | \n", "100 | \n", "100 | \n", "3D_wt | \n", "
6 | \n", "rw_039_02_01_rnaseq | \n", "41912042 | \n", "44742750 | \n", "0.9367 | \n", "100 | \n", "100 | \n", "3D_wt | \n", "
7 | \n", "rw_040_01_01_rnaseq | \n", "47846169 | \n", "51477759 | \n", "0.9295 | \n", "100 | \n", "100 | \n", "3D_sh | \n", "
8 | \n", "rw_040_02_01_rnaseq | \n", "47094242 | \n", "50976488 | \n", "0.9238 | \n", "100 | \n", "100 | \n", "3D_sh | \n", "
PCAs were generated using TPMs as calculated by Kallisto. It looks like one of the replicas from 3D_wt might have been swapped with 3D_sh. We will explore this later on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2D_wt vs 2D_sh\n", "\n", "\n", "\n", "
PCA looks good.\n", " \n", "
Summary table with mean counts (per condition) per gene is named: sleuth_table_wTPM_2D_wt_vs_sh.csv\n", "\n", "## DE analysis\n", "We do the DE analysis aggregating transcripts into genes. We use the likelihood ratio test (lrt). For this comparison we obtain 350 genes.\n", "\n", "Summary table for significant genes is named significant2D_wt_vs_sh.csv\n", "\n", "Because for this test sleuth does not return a fold change or equivalent metric, I have calculated 3 additional columns: tpm_wt and tpm_sh that are the sum of the average tpm (in replicas) for all transcripts that belong to a gene. After that we have tpm_wt_by_sh, that simply divides log2(tpm_wt +0.1/ tpm_sh+0.1) with a pseudocount of 0.1. Therefore we have a relative measure of the change between wt and sh.\n", "\n", "\n", "\n", "## An example of a DE gene: NUDT5\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3D_wt vs 3D_sh\n", "\n", "
Summary table with mean counts (per condition) per gene is named: sleuth_table_wTPM_3D_wt_vs_sh.csv\n", "\n", "We checked the patterns of ALDH1, NUDT5, OCT4 (POU5F1), CD44 that are supposed to be upregulated in *wt*. NUDT5 gene has a profile in the samples compatible with the sample tags shown in the PCA. The other genes can't be found (ALDH1) or have inconsistent patterns across transcripts. Therefore we perform the DE analysis using the original tags although the clustering in the PCA is not very good.\n", "\n", "We get 3 DE genes using lrt test.\n", "\n", "Summary table for significant genes is named significant3D_wt_vs_sh.csv\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }