PolyA_DB: a database for mammalian mrna polyadenylation

Similar documents
Epigenetic regulation of Plasmodium falciparum clonally. variant gene expression during development in An. gambiae

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Comparing DNA Sequences Cladogram Practice

Presence and Absence of COX8 in Reptile Transcriptomes

Alternative Polyadenylation of Mammalian Transcripts Is Generally Deleterious, Not Adaptive

Subdomain Entry Vocabulary Modules Evaluation

Upstream elements present in the 3 UTR of collagen genes influence the processing. efficiency of overlapping polyadenylation signals

Variation and evolution of polyadenylation profiles in sauropsid mitochondrial mrnas as deduced from the high-throughput RNA sequencing

Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Effective Vaccine Management Initiative

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Campaign Communication Materials 18 November 2008

Comparing DNA Sequence to Understand

Lecture 6: Fungi, antibiotics and bacterial infections. Outline Eukaryotes and Prokaryotes Viruses Bacteria Antibiotics Antibiotic resistance

Genome 371; A 03 Berg/Brewer Practice Exam I; Wednesday, Oct 15, PRACTICE EXAM GENOME 371 Autumn 2003

The genetic basis of breed diversification: signatures of selection in pig breeds

Lecture 11 Wednesday, September 19, 2012

Reintroducing bettongs to the ACT: issues relating to genetic diversity and population dynamics The guest speaker at NPA s November meeting was April

HerdMASTER 4 Tip Sheet CREATING ANIMALS AND SIRES

Testing Phylogenetic Hypotheses with Molecular Data 1

Required and Recommended Supporting Information for IUCN Red List Assessments

Identity Management with Petname Systems. Md. Sadek Ferdous 28th May, 2009

Table of contents. DNA Dog food

This document is a preview generated by EVS

Manhattan and quantile-quantile plots (with inflation factors, λ) for across-breed disease phenotypes A) CCLD B)

The ALife Zoo: cross-browser, platform-agnostic hosting of Artificial Life simulations

In the first half of the 20th century, Dr. Guido Fanconi published detailed clinical descriptions of several heritable human diseases.

Genes What are they good for? STUDENT HANDOUT. Module 4

Development and improvement of diagnostics to improve use of antibiotics and alternatives to antibiotics

Prof Michael O Neill Introduction to Evolutionary Computation

Genetics of Arrhythmogenic Right Ventricular Cardiomyopathy in Boxer dogs: a cautionary tale for molecular geneticists.

DOWNLOAD OR READ : MOLECULAR PATHOLOGY AND THE DYNAMICS OF DISEASE PDF EBOOK EPUB MOBI

About 1/3 of UK dogs are overweight that s over 2.5 million dogs! Being overweight is associated with: Orthopaedic disease. e.g.

Developmental expression of synthetic cis-regulatory systems composed of spatial control elements from two different genes

TITLE: Recognition and Diagnosis of Sepsis in Rural or Remote Areas: A Review of Clinical and Cost-Effectiveness and Guidelines

and suitability aspects of food control. CAC and the OIE have Food safety is an issue of increasing concern world wide and

Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate

Supplemental Information. A Deletion in the Canine POMC Gene. Is Associated with Weight and Appetite. in Obesity-Prone Labrador Retriever Dogs

OIE Regional Commission for Europe Regional Work Plan Framework Version adopted during the 85 th OIE General Session (Paris, May 2017)

Jerry and I am a NGS addict

Follow-up meeting 2 GENBAS

Veterinarian Assistant

1 Turtle Graphics Concepts

Writing Simple Procedures Drawing a Pentagon Copying a Procedure Commanding PenUp and PenDown Drawing a Broken Line...

STEPHEN N. WHITE, PH.D.,

AP Lab Three: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

November Final Report. Communications Comparison. With Florida Climate Institute. Written by Nicole Lytwyn PIE2012/13-04B

Pain management: making the most of the latest options

Check the box after reviewing with your staff. DNA Collection Kit (Cheek Swab) Mailing a DNA Cheek Swab to BioPet. Waste Sample Collection

Guideline Diagnostic Data in Poultry Slaughtering

Inhibiting Microbial Growth in vivo. CLS 212: Medical Microbiology Zeina Alkudmani

Cover Page. The handle holds various files of this Leiden University dissertation.

Coding with Scratch - First Steps

Hydraulic Report. County Road 595 Bridge over Yellow Dog River. Prepared By AECOM Brian A. Hintsala, P.E

VETERINARY TOXICOLOGY INFORMATION SYSTEM

Criteria for Selecting Species of Greatest Conservation Need

TREAT Steward. Antimicrobial Stewardship software with personalized decision support

Animal Sciences (

The OIE Manual of Diagnostic Tests and Vaccines for Terrestrial & Aquatic Animals

number Done by Corrected by Doctor Dr Hamed Al-Zoubi

Keeping and Using Flock Records Scott P. Greiner, Ph.D. Extension Animal Scientist, Virginia Tech

Career Explosion! A Boom of Veterinary Paraprofessional Students

Antimicrobial Stewardship and Use Monitoring Michael D. Apley, DVM, PhD, DACVCP Kansas State University, Manhattan, KS

Molecular characterization of CMO. A canine model of the Caffey syndrome, a human rare bone disease

Graphics libraries, PCS Symbols, Animations and Clicker 5

Biology 164 Laboratory

Co-transfer of bla NDM-5 and mcr-1 by an IncX3 X4 hybrid plasmid in Escherichia coli 4

RESPONSIBILITIES OF THE PRESCRIBING VETERINARIAN

Muppet Genetics Lab. Due: Introduction

Evolution in dogs. Megan Elmore CS374 11/16/2010. (thanks to Dan Newburger for many slides' content)

BY CAROLE RICH WRITING AND REPORTING NEWS: A COACHING METHOD (8TH EIGHTH EDITION) [PAPERBACK] FROM CENGAGE LEARNING

How to use Mating Module Pedigree Master

Optimizing use of quality antimicrobial medicines in humans

A pilot integrative knowledgebase for the characterization and tracking of multi resistant Acinetobacter baumannii in Colombian hospitals

QUANTITATIVE AND QUALITATIVE IMPROVEMENT OF THE SHEEP MUTTON PRODUCTION WITH THE HELP OF MOLECULAR MARKER AND GENOME EDITING TECHNOLOGY : A REVIEW

CLADISTICS Student Packet SUMMARY Phylogeny Phylogenetic trees/cladograms

DACHSHUND BREED COUNCIL Judging List Application Form

Clarifications to the genetic differentiation of German Shepherds

Chapter concepts: What are antibiotics, the different types, and how do they work? Antibiotics

Linked genetic variants on chromosome 10 control ear morphology and body mass among dog breeds

Cytogenetic Investigation of Canine Soft Tissue Sarcomas. and Histiocytic Malignancies INFORMED CONSENT FOR PARTICIPANTS GOLDEN RETRIEVER

The City School. Learn Create Program

VIZSLA EPILEPSY RESEARCH PROJECT General Information

Definition of Homologous Synteny Blocks (HSBs)

Turtle Ballet: Simulating Parallel Turtles in a Nonparallel LOGO Version. Erich Neuwirth

Terrestrial and Aquatic Manuals and mechanism of standard adoption

VetBact culturing bacteriological knowledge for veterinarians

ARBA STANDARD OF PERFECTION BY AMERICAN RABBIT BREEDERS ASSOCIATION, ERIC STEWART

PNCC Dogs Online. Customer Transactions Manual

Consequences of Antimicrobial Resistant Bacteria. Antimicrobial Resistance. Molecular Genetics of Antimicrobial Resistance. Topics to be Covered

Dark Skin, Blond Hair: Surprise in the Solomon Islands

Name: Date: Hour: Fill out the following character matrix. Mark an X if an organism has the trait.

MID 23. Antimicrobial Resistance. Consequences of Antimicrobial Resistant Bacteria. Molecular Genetics of Antimicrobial Resistance

Franck Berthe Head of Animal Health and Welfare Unit (AHAW)

Terrestrial and Aquatic Manuals and the mechanism of standard adoption

Antimicrobial Resistance

Antimicrobial Resistance Acquisition of Foreign DNA

Factors Affecting Breast Meat Yield in Turkeys

History of Lineages. Chapter 11. Jamie Oaks 1. April 11, Kincaid Hall 524. c 2007 Boris Kulikov boris-kulikov.blogspot.

Transcription:

D116 D120 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki055 PolyA_DB: a database for mammalian mrna polyadenylation Haibo Zhang 1,2, Jun Hu 2, Michael Recce 1 and Bin Tian 2, * 1 Center for Computational Biology and Bioengineering, New Jersey Institute of Technology, Newark, NJ 07102, USA and 2 Department of Biochemistry and Molecular Biology and Bioinformatics Center, New Jersey Medical School, UMDNJ, Newark, NJ 07101, USA Received August 19, 2004; Revised and Accepted October 1, 2004 ABSTRACT Messenger RNA polyadenylation is one of the key post-transcriptional events in eukaryotic cells. A large number of genes in mammalian species can undergo alternative polyadenylation, which leads to mrnas with variable 3 0 ends. As the 3 0 end of mrnas often contains cis elements important for mrna stability, mrna localization and translation, the implications of the regulation of polyadenylation can be multifold. Alternative polyadenylation is controlled by cis elements and trans factors, and is believed to occur in a tissue- or disease-specific manner. Given the availability of many databases devoted to other aspects of mrna metabolism, such as transcriptional initiation and splicing, systematic information on polyadenylation, including alternative polyadenylation and its regulation, is noticeably lacking. Here, we present a database named polya_db, through which we strive to provide several types of information regarding polyadenylation in mammalian species: (i) polyadenylation sites and their locations with respect to the genomic structure of genes; (ii) cis elements surrounding polyadenylation sites; (iii) comparison of polyadenylation configuration between orthologous genes; and (iv) tissue/ organ information for alternative polyadenylation sites. Currently, polya_db contains 45 565 polyadenylation sites for 25 097 human and mouse genes, representing the most comprehensive polyadenylation database till date. The database is accessible via the website (http://polya.umdnj.edu/polyadb). INTRODUCTION Polyadenylation of eukaryotic mrnas is a two-step process, which includes a specific cleavage at the 3 0 end of a nascent mrna and addition of a poly(a) tail (1). Polyadenylation has impacts on many aspects of mrna metabolism in the cell, including mrna stability, mrna localization and translation (2). Enhanced efficiency of polyadenylation can lead to diseases such as thrombophilia (3), highlighting the importance of the regulation of polyadenylation. Both cis elements and trans factors are involved in the regulation of polyadenylation. Cis elements can be divided into two groups based on their relative location to the cleavage site, namely upstream and downstream elements. The most well-known upstream element in metazoan cells is the polyadenylation signal (PAS) located 10 30 nt upstream of the cleavage site. Although the AAUAAA hexamer is the most common PAS, 11 single-nucleotide variants have been demonstrated or suggested to play similar roles in polyadenylation (4). PAS motifs are the binding sites for the cleavage polyadenylation specificity factor (CPSF). U-rich and GU-rich elements are common downstream elements located 20 40 nt downstream of the cleavage site (5 7). They are the binding sites for the cleavage stimulatory factor (CstF) (8). In addition, sequence composition surrounding the cleavage site has been found to be important for defining the site in several bioinformatics studies (6,9 11). Other factors responsible for the polyadenylation process include cleavage factors I and II (CFI and CFII), and poly(a) polymerase (PAP) (12,13). Recently, several transcriptional factors and the RNA polymerase II enzyme have also been implicated in the polyadenylation process (14). Moreover, various auxiliary elements of viral or cellular origins have been shown to regulate polyadenylation (12). It has been estimated that more than 29% of human genes have alternative polyadenylation sites [or poly(a) sites] (15). The choice of alternative poly(a) sites is believed to be related to biological conditions such as cell type and disease state (16). Alternative polyadenylation can lead to mrnas with variable 3 0 ends, or proteins with different C-termini. A growing number of genes have been found to be regulated by this mechanism. However, a public database systematically providing information on alternative polyadenylation is lacking. *To whom correspondence should be addressed. Tel: +1 973 972 3615; Fax: +1 973 972 5594; Email: btian@umdnj.edu The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions@oupjournals.org. ª 2005, the authors Nucleic Acids Research, Vol. 33, Database issue ª Oxford University Press 2005; all rights reserved

Nucleic Acids Research, 2005, Vol. 33, Database issue D117 The availability of genomic sequences from several mammalian species as well as large numbers of expressed sequence tags (ESTs) make it feasible to comprehensively document mrna polyadenylation configurations for genes. Although ESTs provide both sequence data and information on the biological origin of transcripts by the means of cdna library source, they have several problems with respect to data quality, such as chimeric sequence, vector contamination and inclusion of genomic sequence. In addition, when dealing with polyadenylation, issues such as internal priming and low-quality sequences at the 5 0 and 3 0 ends are more palpable. Therefore, a computational approach to studying polyadenylation must take these into consideration to ensure that poly(a) sites are accurately mapped. We present here a computational pipeline that effectively utilizes genomic sequences and EST data to study polyadenylation. We applied it to human and mouse genes to build a database for polyadenylation (named polya_db). Currently, the database documents 45 565 poly(a) sites and various information regarding the sites, including their genome locations, evidence of cdna/est alignments to genomes, cis elements surrounding poly(a) sites, comparison of polyadenylation configuration between orthologs and tissue/organ information for poly(a) sites. Although only human and mouse poly(a) sites are currently documented in the database, the data process pipeline and the structure of the database are designed so as to make it easy to include other species in the future. This resource can be of great value to researchers interested in studying both the mechanism of polyadenylation and the gene regulation by alternative polyadenylation. RESULTS Methods and data statistics In the polya_db database, genes are annotated based on LocusLink IDs (17). A computational pipeline (depicted in Figure 1) is designed to accurately map poly(a) sites on genomes: (i) The genomic location and the structure of a gene is determined by the alignment of its RefSeq sequence(s) and genome contigs. In the current version, human genome Build 34.2, mouse genome Build 30 and NCBI March 2004 release of RefSeq mrnas were used. If a gene has more than one RefSeq sequence, their alignments to genomes are required to overlap, and their transcriptional orientations are required to be the same. The transcriptional orientation of a gene is determined both by its splicing and poly(a) tail information whenever possible. If a gene does not meet these two criteria, it is discarded. (ii) To ensure that only high-quality cdna/est data are used, the alignment of a cdna/est with the genome is required to overlap with that of its corresponding RefSeq. This resulted in the discarding of 7.8% human and 10.4% mouse cdna/ests that contained poly(a) or poly(t) tails, respectively. The mapping between cdna/est and RefSeq was obtained from the UniGene database (18). Figure 1. An outline of the polya_db building pipeline. The data flow is indicated by arrowed lines. See the main text for details. (iii) Only those cdna/ests with poly(a) tails [or poly(t) tails if in anti-sense orientation] are used to infer poly(a) cleavage sites. Poly(A) tails are required to have either eight or more consecutive As; if it has a nucleotide other than A, another eight or more consecutive As after that nucleotide are required. Possible internal priming sites are checked by examining the genomic sequence 10 to +10 nt surrounding the cleavage site. If the sequence has six continuous As or more than seven As in a 10 nt window, it is considered as an internal priming candidate. Poly(A) cleavage sites located within a 24 nt window are considered to be generated from heterogeneous cleavage of mrna (19), and thus are clustered together. (iv) To further ensure the mapping quality, a genuine polyadenylation site must be either supported by more than one cdna/est sequence, or supported by one cdna/ EST alignment together with at least one PAS within the upstream 40 to 1 nt region. An internal priming candidate can be considered as a genuine site if and only if it is supported by more than one cdna/est and has an upstream PAS. Data generated from the pipeline, including genomic locations of poly(a) sites, supporting cdna/ests, number of cleavage sites and PAS information are stored in a relational database using MySQL. Also in the database are the ortholog information of genes obtained from HomoloGene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene), and the tissue/organ information of ESTs derived from cdna library files from NCBI. Several key data statistics of the database are summarized in Table 1.

D118 Nucleic Acids Research, 2005, Vol. 33, Database issue Table 1. PolyA_DB statistics Homo sapiens Mus musculus Total Aligned cdna/ests 2 103 995 1 181 194 3 285 189 Poly(A) sites 29 283 16 282 45 565 Genes with one poly(a) site 6418 7577 13 995 Genes with alternative 7524 3578 11 102 poly(a) sites Total genes 13 942 11 155 25 097 Orthologous pairs 7935 7935 7935 Tissue types a 331 155 455 Organ types a 107 47 133 a It includes diseased tissues and organs. Some tissue and organ types occur in both human and mouse cases. Data access and visualization Data and documentation are available from the polya_db web server at http://polya.umdnj.edu/polyadb. Data can be either downloaded as flat files or queried through a web interface where graphics are dynamically generated using Bioperl modules (20). The web user interface is interactive and provides five basic views: Gene view. This view provides a summary of poly(a) site(s) for each queried gene, their positions relative to the RefSeq(s) and the genome contig. The gene structure inferred from the RefSeq(s) and a summary table containing cdna/est evidence and number of cleavage sites are also provided. A

Nucleic Acids Research, 2005, Vol. 33, Database issue D119 B C Figure 2. Views offered at the web interface of polya_db. (A) Gene view: Mouse gene TPM2 is used as an example. The output includes a pictorial representation of gene structure and poly(a) sites as well as two summary tables regarding the gene and the poly(a) sites. Numbers under each poly(a) site in the picture are the number of supporting cdna/ests and the number of heterogeneous cleavage sites. (B) Ortholog view of human and mouse TPM2 genes. (C) Signal views of mouse and human TPM2 genes are shown in the upper panel and lower panel, respectively. The position of a signal is relative to the cleavage site, which is set to 0. Links for sequence IDs to NCBI resources are provided whenever possible. Figure 2A shows a polya_db gene view of mouse TPM2 gene (b-tropomyosin, LocusLink ID: 22004), with two poly(a) sites and their positions, inferred gene structure, start and stop codon positions, number of supporting ESTs and number of sites generated by heterogeneous cleavage. Evidence view. This view provides detailed alignment evidence from cdna/est sequences, which can be presented by various sorting options including the 3 0 or 5 0 position, exon number, cdna/est length and GenBank ID. A table is also provided, which lists all supporting cdna/ests with links to NCBI (Supplementary Figure 1). Ortholog view. This view provides a comparison of a pair of human and mouse orthologs. Figure 2B shows an ortholog view of mouse TPM2 gene and its human ortholog (LocusLink ID: 7169). The ortholog view readily revealed that the ortholog pair is conserved with respect to both their gene structures and polyadenylation configurations. Signal view. This view provides information regarding cis elements in the surrounding region of a poly(a) site. Currently, we document the PAS motif (AAUAAA and its

D120 Nucleic Acids Research, 2005, Vol. 33, Database issue 11 single-nucleotide variants) (4) in the 1 40 nt upstream region of a poly(a) site. Figure 2B shows signal views of the mouse and human TPM2 genes, from which the conservation of signal usage of poly(a) sites can be easily discerned. Body view. This view provides tissue/organ information for poly(a) sites (Supplementary Figure 2). CONCLUSIONS We present polya_db database a resource for mammalian mrna polyadenylation. This database contains comprehensive information regarding polyadenylation, including poly(a) sites in the context of gene structure, cdna/est evidence for poly(a) sites, PASs, conservation of polyadenylation configuration between orthologs and tissue/organ information for poly(a) site usage. We believe polya_db will be of great use to researchers studying both the mechanism of polyadenylation and the gene regulation by alterative polyadenylation. PolyA_DB will be continuously updated (i) when new releases of human and mouse genomes and cdna/est data are available, and (ii) genome and cdna/est data from other species are available for large-scale polyadenylation studies. SUPPLEMENTARY MATERIAL Supplementary Material is available at NAR Online. ACKNOWLEDGEMENTS We thank Carol S. Lutz at UMDNJ and two anonymous reviewers for their valuable suggestions, and Stephen B. Feldman at UMDNJ for technical assistance with the web server. REFERENCES 1. Colgan,D.F. and Manley,J.L. (1997) Mechanism and regulation of mrna polyadenylation. Genes Dev., 11, 2755 2766. 2. Lewis,J.D., Gunderson,S.I. and Mattaj,I.W. (1995) The influence of 5 0 and 3 0 end structures on pre-mrna metabolism. J. Cell Sci. Suppl., 19, 13 19. 3. Gehring,N.H., Frede,U., Neu-Yilik,G., Hundsdoerfer,P., Vetter,B., Hentze,M.W. and Kulozik,A.E. (2001) Increased efficiency of mrna 3 0 end formation: a new genetic mechanism contributing to hereditary thrombophilia. Nature Genet., 28, 389 392. 4. Beaudoing,E., Freier,S., Wyatt,J.R., Claverie,J.M. and Gautheret,D. (2000) Patterns of variant polyadenylation signal usage in human genes. Genome Res., 10, 1001 1010. 5. Zarudnaya,M.I., Kolomiets,I.M., Potyahaylo,A.L. and Hovorun,D.M. (2003) Downstreamelements of mammalian pre-mrna polyadenylation signals: primary, secondary and higher-order structures. Nucleic Acids Res., 31, 1375 1386. 6. Legendre,M. and Gautheret,D. (2003) Sequence determinants in human polyadenylation site selection. BMC Genomics, 4, 7. 7. Takagaki,Y. and Manley,J.L. (1997) RNA recognition by the human polyadenylation factor CstF. Mol. Cell Biol., 17, 3907 3914. 8. MacDonald,C.C., Wilusz,J. and Shenk,T. (1994) The 64-kilodalton subunit of the CstF polyadenylation factor binds to pre-mrnas downstream of the cleavage site and influences cleavage site location. Mol. Cell Biol., 14, 6647 6654. 9. Tabaska,J.E. and Zhang,M.Q. (1999) Detection of polyadenylation signals in human DNA sequences. Gene, 231, 77 86. 10. Hajarnavis,A., Korf,I. and Durbin,R. (2004) A probabilistic model of 3 0 end formation in Caenorhabditis elegans. Nucleic Acids Res., 32, 3392 3399. 11. Graber,J.H., Cantor,C.R., Mohr,S.C. and Smith,T.F. (1999) In silico detection of control signals: mrna 3 0 -end-processing sequences in diverse species. Proc. Natl Acad. Sci. USA, 96, 14055 14060. 12. Zhao,J., Hyman,L. and Moore,C. (1999) Formation of mrna 3 0 ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mrna synthesis. Microbiol. Mol. Biol. Rev., 63, 405 445. 13. Proudfoot,N. (1996) Ending the message is not so simple. Cell, 87, 779 781. 14. Calvo,O. and Manley,J.L. (2003) Strange bedfellows: polyadenylation factors at the promoter. Genes Dev., 17, 1321 1327. 15. Beaudoing,E. and Gautheret,D. (2001) Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res., 11, 1520 1526. 16. Edwalds-Gilbert,G., Veraldi,K.L. and Milcarek,C. (1997) Alternative poly(a) site selection in complex transcription units: means to an end? Nucleic Acids Res., 25, 2547 2561. 17. Pruitt,K.D. and Maglott,D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res., 29, 137 140. 18. Wheeler,D.L., Church,D.M., Federhen,S., Lash,A.E., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Sequeira,E., Tatusova,T.A. et al. (2003) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 31, 28 33. 19. Pauws,E., van Kampen,A.H., van de Graaf,S.A., de Vijlder,J.J. and Ris-Stalpers,C. (2001) Heterogeneity in polyadenylation cleavage sites in mammalian mrna sequences: implications for SAGE analysis. Nucleic Acids Res., 29, 1690 1694. 20. Stajich,J.E., Block,D., Boulez,K., Brenner,S.E., Chervitz,S.A., Dagdigian,C., Fuellen,G., Gilbert,J.G., Korf,I., Lapp,H. et al. (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res., 12, 1611 1618.