Author: JL Villanueva (joseluis.villanueva@crg.eu)
This report describes the differential gene expression analysis comparing WT and sh samples (NUDIX5). We will use Kallisto (v 0.43.0) for quantification at the transcript level and Sleuth (v 0.30.0) for testing differential expression (DE) aggregating transcripts into genes.
num | sample | condition | sequencing |
---|---|---|---|
1 | rw_026_01_01_rnaseq | 2D_wt | 06/27/2018 |
2 | rw_030_01_01_rnaseq | 2D_wt | 06/27/2018 |
3 | rw_031_01_01_rnaseq | 2D_sh | 06/27/2018 |
4 | rw_032_01_01_rnaseq | 2D_sh | 06/27/2018 |
5 | rw_039_01_01_rnaseq | 3D_wt | 07/24/2018 |
6 | rw_039_02_01_rnaseq | 3D_wt | 07/24/2018 |
7 | rw_040_01_01_rnaseq | 3D_sh | 07/24/2018 |
8 | rw_040_02_01_rnaseq | 3D_sh | 07/24/2018 |
X | sample | reads_mapped | reads_proc | frac_mapped | bootstraps_present | bootstraps_used | condition |
---|---|---|---|---|---|---|---|
1 | rw_026_01_01_rnaseq | 44375965 | 47925975 | 0.9259 | 100 | 100 | 2D_wt |
2 | rw_030_01_01_rnaseq | 55177695 | 59107148 | 0.9335 | 100 | 100 | 2D_wt |
3 | rw_031_01_01_rnaseq | 44576517 | 48206868 | 0.9247 | 100 | 100 | 2D_sh |
4 | rw_032_01_01_rnaseq | 56474041 | 61169209 | 0.9232 | 100 | 100 | 2D_sh |
5 | rw_039_01_01_rnaseq | 52429267 | 57390135 | 0.9136 | 100 | 100 | 3D_wt |
6 | rw_039_02_01_rnaseq | 41912042 | 44742750 | 0.9367 | 100 | 100 | 3D_wt |
7 | rw_040_01_01_rnaseq | 47846169 | 51477759 | 0.9295 | 100 | 100 | 3D_sh |
8 | rw_040_02_01_rnaseq | 47094242 | 50976488 | 0.9238 | 100 | 100 | 3D_sh |
Initially we wanted to test wt vs sh in 2D and 3D and also 2Dwt vs 3Dwt / 2Dsh vs 3Dsh. However there is a batch effect as can be seen in the sequencing date, making it impossible to do the latter comparison.
Therefore we will only test wt vs sh (2D and 3D).
Code for the whole analysis is available in scripts folder. Figures in Figures and tables in tables folders respectively.
PCAs were generated using TPMs as calculated by Kallisto. It looks like one of the replicas from 3D_wt might have been swapped with 3D_sh. We will explore this later on.
PCA looks good.
Summary table with mean counts (per condition) per gene is named: sleuth_table_wTPM_2D_wt_vs_sh.csv
We do the DE analysis aggregating transcripts into genes. We use the likelihood ratio test (lrt). For this comparison we obtain 350 genes.
Summary table for significant genes is named significant2D_wt_vs_sh.csv
Because for this test sleuth does not return a fold change or equivalent metric, I have calculated 3 additional columns: tpm_wt and tpm_sh that are the sum of the average tpm (in replicas) for all transcripts that belong to a gene. After that we have tpm_wt_by_sh, that simply divides log2(tpm_wt +0.1/ tpm_sh+0.1) with a pseudocount of 0.1. Therefore we have a relative measure of the change between wt and sh.
Summary table with mean counts (per condition) per gene is named: sleuth_table_wTPM_3D_wt_vs_sh.csv
We checked the patterns of ALDH1, NUDT5, OCT4 (POU5F1), CD44 that are supposed to be upregulated in wt. NUDT5 gene has a profile in the samples compatible with the sample tags shown in the PCA. The other genes can't be found (ALDH1) or have inconsistent patterns across transcripts. Therefore we perform the DE analysis using the original tags although the clustering in the PCA is not very good.
We get 3 DE genes using lrt test.
Summary table for significant genes is named significant3D_wt_vs_sh.csv