Information to be included in a gene/disease specific database

The ideal database should be as inclusive as possible, and collaboration between scientists and clinicians is recommended to form a uniform approach to optimise the quality and use of information [1].

The scope of information in any database depends on the purpose the database is designed to fill and the availability of data.

General information to accompany the database

  • Explanation of content and purpose of database
    • Clearly define the purpose of the database and the context in which the information/data is intended to be used (clinical diagnostics, research, theranostics, etc.) [2].
  • Curator/organisation details
    • Including level of curation [2].
  • Database policy
    • Including criteria for inclusion and exclusion of data [2].
    • Information on technical and administration functions to ensure data integrity and security [2].
  • Date of last update [3]
  • Online submission information including submission instruction guide [3]
  • Registration [3]
    • Include requirements for data access that is not publicly available [2].
  • Disclaimers, Terms and conditions, copyright

General information to be provided in the database

The essential aspect of a gene/disease specific database focuses around description of allelic variants [4]

  • Link to reference sequence
    • All variant reporting is required to be in relation to genomic reference sequences; coding DNA, RNA and protein.
    • The recommended format for reference sequences is the LRG sequence format
      • If an LRG does not exist, a new LRG record can be requested via the above website
  • Information or link to information on the gene
    • It is recommended that the database website provides links to additional locus/phenotype information, and links to gene/disease information resources for the public to access [4]
      • Links to central genetic databases (OMIM, HGMD)
      • Links to other relevant gene/disease specific databases (LSDB)
      • Links to Literature collection/sources

For more information, see the following publications: [3,4,5,6].

Minimal standard Gene-specific information

  • Standardised gene name and gene symbol [7].
  • Chromosomal location
  • Variant description
    • Description at DNA/RNA level
    • Description at protein level
  • Complete reference list of variations
  • A table of all variants in the database [5]
  • Links to references
    • The source of the variant information (literature/direct submission etc) [4]


Mutalyzer is an extremely useful online tool for checking variant descriptions for errors or deviation from the HGVS nomenclature.

For more information, see the following publications: [8,3,4,5,6]

Additional information

  • Effect on protein function
  • List of known and suspected disease associations and causal links
  • Disease and phenotypic information: summary and/or detailed description [2]

When available, phenotypic and supporting information should be provided, and this may include:

  • Patient history and diagnosis
  • Inheritance information, carrier status
  • Ethnicity
  • Sex
  • Age at diagnosis
  • Relevant non-genetic pathology and medical results
  • Mutation frequency information
  • Detection methods
  • Population information


Care should be taken whenever information about individuals is included in a database. For more information, refer to the section of this guide on Ethical, Legal and Social Issues.

For more information, see the following publications: [8,3]

Data Sources

Data contained in gene/disease specific databases is generated from these main sources [3]:

  • Published literature
  • Other databases
  • Direct submission
    • Unpublished laboratory/clinical data
  • HVP Country Nodes [9]


  1. 1.Vihinen, M., Dunnen, J.T. den, Dalgleish, R., and Cotton, R.G.H. Guidelines for establishing locus specific databases. Human Mutation 33, 2 (2012), 298–305.
  2. 2.Royal College of Pathologists of Australia. Standards for clinical databases of genetic variants. 2014.
  3. 3.Celli, J., Dalgleish, R., Vihinen, M., Taschner, P.E.M., and Dunnen J.T., den. Curating gene variant databases (LSDBs): Toward a universal standard. Human Mutation 33, 2 (2012), 291–297.
  4. 4.Scriver, C.R., Nowacki, P.M., and H., L. Guidelines and Recommendations for Content, Structure and Deployment of Mutation Databases. Human Mutation 13, (1999), 344–350.
  5. 5.Scriver, C.R., Waters, P.J., Sarkissian, C., et al. PAHdb: A Locus-Specific Knowledgebase. Human Mutation 15, (2000), 99–104.
  6. 6.Claustres, M., Horatis, O., Vanevski, M., and Cotton, R.G.H. Time for a Unified System of Mutation Description and Reporting: A Review of Locus-Specific Mutation Databases. Genomic Research 12, (2002), 680–688.
  7. 7.Dunnen, J.T. den, Antonarakis, S.E., and a, et. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Human Mutation 15, 1 (2000), 7–12.
  8. 8.Mitropoulou, C., Webb, A.J., Mitropoulos, K., Brookes, A.J., and Patrinos, G.P. Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Human Mutation 31, 10 (2010), 1109–1116.
  9. 9.Human Variome Project. Project Roadmap 2012-2016. 2012.