Gene/Disease specific database curation

The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future [1].

Curation of gene/disease specific databases is ideally done by experts in the gene or disease of interest, rather than by a central database collecting published data from the literature. Information in databases can inform clinician decisions on clinical diagnosis, treatments and outcomes, thus errors in the data or reporting of the data can have serious consequences [2].

The availability of a quality, complete and up-to-date gene/disease specific database is an excellent tool for evidence-based diagnostic decisions, provides access internationally thus avoiding the allocation of time and resources a clinician would spend collecting the same information, and can avoid world-wide duplication of the information [3].

Considerations for Curation

Time spent to maintain a gene/disease specific database is dependent on the size of the database, some more common diseases may have much larger databases than rare diseases, minimal curation is estimated to be around one day per week [4].

Cost of maintenance is another consideration, many diseases are rare and funds for curation can be difficult to obtain [4].

Data collection

The curator generally starts with data generated by the curator’s institution or from published literature [5].

The database should be regularly maintained to provide the most current information.

  • It is essential during data entry to check correctness of descriptions and if necessary standardise them to the HGVS format [5].
  • Data duplication should be avoided where possible [5].
  • Identification of data source should be accessible on the database [5].
  • Regular review of literature for new variant reports [5].

Once established, a gene/disease specific database will usually operate with active contributors by direct submission, collaborators would usually register through the database website (depending on software used) or via e-mail or letter to the curator [5].

  • After contact, evaluation of the new submitter is made. The curator should check the submitters contact details and credentials via previous publications, the internet or contacting the appropriate institutions.
  • Patients may also register, providing personal, clinical, and phenotypic details, it is essential for the curator to verify consent to publicly report this information, and to anonymize the information to prevent access by third parties.
  • The curator should check the submission complies with the database standards, review the data and the description of the variant.


A curator should look to advertise the new database and promote direct submissions, some suggestions include [5]:

  • Directly contacting authors who have previously published variants of the gene
  • Contact diagnostic labs performing screens on the gene of interest that are not already submitting to HVP Country node
  • Contact patient/disease organisations and support groups
  • General web search for potential stakeholders
  • Develop contacts and promote contributions through workshops/conferences and annual meetings


LSBDs require continual promotion to direct people to the database as a resource and to encourage submissions [5].

Registrations and submissions should be curated as quickly as possible, as this encourages submitters [5].

  • Curators are encouraged to explain the procedures for submission and answer and enquiries from the submitter.
  • It is suggested that the Curator provide an easy to follow instruction guide for the approved submitters to enter their data.
  • Correctness of the submission is the responsibility of the submitter but should be checked by the curators for inconsistencies or errors in entering data.

The curator of a database is seen as an obvious contact and expert in the field and thus may receive a variety of questions from different groups of stakeholders [5].

  • The curator should be cautious when giving expert advice and opinions on the consequences/pathogenicity of a variant.
  • For a gene with increased diagnostic interest, it would be recommended to establish a committee of experts to regularly meet, discuss and give expert opinions on functional consequences of variants.


  1. Maintain current gene/disease data
  2. Collaborate and actively seek submissions
  3. Provide submitters with guidelines, standards and registration for the database
  4. Respond to submission, registration and database queries
  5. Check for inconsistencies in submitted data including correctness of description, nomenclature, errors, duplications
  6. Resolve and correct any inconsistencies in a timely manner
  7. Liaise with committee of experts in addressing any queries regarding interpretation for clinical purposes, or any controversial variants
  8. Maintain confidentiality and ethical responsibilities in respect to data records
  9. Continual promotion of the database
  10. Plan for continual and ongoing maintenance and longevity of the database


  1. 1.Royal College of Pathologists of Australia. Standards for clinical databases of genetic variants. 2014.
  2. 2.Cotton, R.G.H., Auerbach, A.D., Brown, A.F., et al. A structured simple form for ordering genetic tests is needed to ensure coupling of clinical detail (phenotype) with DNA variants (genotype) to ensure utility in publication and databases. Human Mutation 28, 10 (2007), 931–932.
  3. 3.Cotton, R.G.H. and Macrae, F.A. Reducing the burden of inherited disease: the Human Variome Project. The Medical Journal of Australia 192, 11 (2010), 628–629.
  4. 4.Cotton, R.G.H., Auerbach, A.D., Beckmann, J.S., et al. Recommendations for locus-specific databases and their curation. Human Mutation 29, 1 (2008), 2–5.
  5. 5.Celli, J., Dalgleish, R., Vihinen, M., Taschner, P.E.M., and Dunnen J.T., den. Curating gene variant databases (LSDBs): Toward a universal standard. Human Mutation 33, 2 (2012), 291–297.