Genomic Privacy Project

Inferring Genotype from Clinical Phenotype through a Knowledge Based Algorithm

by Bradley Malin and Latanya Sweeney


Genomic information is becoming increasingly useful for studying the origins of disease. Recent studies have focused on discovering new genetic loci and the influence of these loci upon disease. However, it is equally desirable to go in the opposite direction – that is, to infer genotype from the clinical phenotype for increased efficiency of treatment. This paper proposes a methodology for such inference. Our method constructs a simple knowledge-based model without the need of a domain expert and is useful in situations that have very little data and/or no training data. The model relates a disease’s symptoms to particular clinical states of the disease. Clinical information is processed using the model, where appropriate weighting of the symptoms is learned from observed diagnoses to subsequently identify the state of the disease presented in hospital visits. This approach applies to any simple genetic disorder that has defined clinical phenotypes. We demonstrate the use of our methods by inferring age of onset and DNA mutations for Huntington’s disease patients.

Keywords: DNA privacy, genetic privacy, privacy technology

B. Malin and L. Sweeney, Inferring Genotype from Clinical Phenotype through a Knowledge Based Algorithm. Pacific Symposium on Biocomputing 2002, R.B. Altman et al. (Eds.) (World Scientific, Singapore, 2002). Also available on
MEDLINE. Paper: 12 pages in PDF.

Related Links

Fall 2004 [Data Privacy Lab]