A B Normalized Count Density Density -10 CC A T A T C A T C A T C T AA 5' Fragment End A T C CT AA TC AC CTA T -5 0 CC AT TAC AC T T Supplementary Figure S1 A TA C C TCT TC TC CA C A AAAT TC CT TAA 5 10 TA C TA TA TA TA TA TA TA TA CTA C CCCCCCC CTA TA CTA TA TA TA CAT CAT CAT TA CCCC CT -10-5 0 5 10 C A T A TC CA TA TCA TC C T C CT CA TC TC C A A TA TT C T C AT T C AAA T AACA CT AC AT AC T AC C A T T AC T AC T -10-10 -5-5 0 5 10 A C CTA TA AT C CAT TA TA CAT AT CAT C AT AT TC C CC CCAA T T 0-5 -10 C CA TC AT AT AT AT AT CAT CAT AT C CAT CAT ACCCCC -10-10 -5-5 0 0 5-5 10-10 3' Fragment End C Expected Density TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA CCCCCCCCC CCCCCCCCC CCCCC -10-5 0 5 10 AT C CAT CAT CAT CAT AT C CAT CAT CAT CAT AT C CAT CAT CAT CAT AT C CAT CAT CAT CAT AT C CAT CAT -5 0 5 10-10 -10-5 0-5 -10 D Ratio () This plot shows nucleotide frequencies surrounding the fragment ends for the control experiment in Levin, et al 2010. Note that the 3 sequences are complemented in order to represent that nucleotides that are being primed in second-strand synthesis. See Figure 2 in the main text for more details. 1
2 A. Roberts, C. Trapnell, J Donaghey, J. Rinn and L. Pachter Supplementary Figure S2 The panels below show the inferred bias for each experiment mentioned in the main text. The first can be used as a legend to help interpret the meaning of each plot. Note that the interpreation of the plots in the second row of each figure is identical to Figure 2 (D) of the main text. Dataset Information Dataset Name/Accession Read Type Strand-Specificity Sample 5' Sequence Bias A C T 5' Positional Bias 0-1334 bp 1335-2104 bp 2105-2977 bp 2978-4389 bp > 4389 bp Fragment Length Distribution Empirical = Learned From Data Estimated = Truncated aussian 3' Sequence Bias A C T 3' Positional Bias 0-1334 bp 1335-2104 bp 2105-2977 bp 2978-4389 bp > 4389 bp SRA012427 50bp Paired-End MAQC HBR 0.2 0.4 0.6 0.8 4 3 2 1 0 0.2 0.4 0.6 0.8
Improving RNA-Seq expression estimates by correcting for fragment bias 3 SRA010153_HBR 35bp Single-End MAQC HBR 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8 2.4 1.6 0.8 NSR 34bp Single-End MAQC HBR 0.2 0.4 0.6 0.8 060 045 030 015 000 2.4 1.6 0.8 0.2 0.4 0.6 0.8
4 A. Roberts, C. Trapnell, J Donaghey, J. Rinn and L. Pachter SRA010153_UHR 35bp Single-End MAQC UHR 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8 SOLiD4_HBR_PE_50x25 50x25bp Paired-End Second Strand Only MAQC HBR 0.2 0.4 0.6 0.8 24 16 08 00 0.2 0.4 0.6 0.8
Improving RNA-Seq expression estimates by correcting for fragment bias 5 SOLiD4_UHR_PE_50x25 50x25bp Paired-End Second Strand Only MAQC UHR 0.2 0.4 0.6 0.8 24 16 08 00 0.2 0.4 0.6 0.8 SRA008403 32bp Single-End MAQC UHR 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8
6 A. Roberts, C. Trapnell, J Donaghey, J. Rinn and L. Pachter SRA001149_dT_tech 35bp Single-End Yeast BY4741 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8 SRA001149_dT_bio 35bp Single-End Yeast BY4741 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8
Improving RNA-Seq expression estimates by correcting for fragment bias 7 SRA020818_RH 75bp Paired-End Yeast 0.2 0.4 0.6 0.8 12 09 06 03 00 0.2 0.4 0.6 0.8 SRA020818_dUTP 75bp Paired-End First Strand Only Yeast 0.2 0.4 0.6 0.8 12 09 06 03 00 0.2 0.4 0.6 0.8
8 A. Roberts, C. Trapnell, J Donaghey, J. Rinn and L. Pachter SRA020818_rna_ligation 75bp Single-End Second Strand Only Yeast 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8
Improving RNA-Seq expression estimates by correcting for fragment bias 9 SRA020818_ill_ligation 75bp Single-End Second Strand Only Yeast 0.2 0.4 0.6 0.8 060 045 030 015 000 0.2 0.4 0.6 0.8 2.4 1.6 0.8 SRA020818_NNSR 73bp Paired-End First Strand Only Yeast 0.2 0.4 0.6 0.8 100 075 050 025 000 2.4 1.6 0.8 0.2 0.4 0.6 0.8
10 A. Roberts, C. Trapnell, J Donaghey, J. Rinn and L. Pachter Supplementary Figure S3 Initial Estimates Corrected Estimates Cufflinks FPKM 0 500 1000 1500 2000 R 2 = 0.758 0 200 600 1000 R 2 = 0.812 enominator RPKM 0 1000 3000 5000 R 2 = 0.711 0 1000 2000 3000 4000 R 2 = 0.715 mseq RPKM 0 500 1000 1500 2000 R 2 = 0.73 R 2 = 0.755 Plots showing the correlation between the TaqMan qpcr data and RNA-Seq expression estimates before (left) and after (right) the three correction methods compared in the text.
Improving RNA-Seq expression estimates by correcting for fragment bias 11 Supplementary Figure S4 Normalized NanoString Count 0 10000 20000 30000 40000 50000 R 2 = 0.77 0 200 400 600 800 Cufflinks FPKM We compared our expression estimates to NanoString on a set of 95 genes, where for each gene we performed a NanoString experiment (see Methods). Although the overall correlation was good (R 2 = 0.77), we could not explain a number of outliers (circled), and we also did not find an improvement in correlation when correcting for bias (in contrast to the case with qrt-pcr that we elaborate on in the main text and all other validations we attempted). Furthermore, we noticed high variance between replicates (see Data). We report these data because of its value in assessing expression accuracy in conjunction with previously generated data reported in Trapnell et al. 2010.