Algorithms for Characterizing Copy Number Variation in Human Genome

Not long ago, it was discovered that individuals may differ in copy numbers of their genes, meaning that a segment of DNA may have more or less copies than usual in an individual's chromosome. Recent research suggests that these variations are associated with many diseases including Autism and Schizophrenia. Copy number variation (CNV) in somatic cells also underly various cancers. Copy numbers are usually identified using SNP microarrays, however, short-read sequence data is emerging as an important resource for characterizing structural variation in human genome. As an interdisciplinary group of researchers at Case Western Reserve University, we develop optimization based algorithms for fast and accurate identification of rare and de novo CNVs, as well as copy number polymorphisms (CNPs) and other forms of structural variation from these two data sources, with a view to enabling personalized genomics applications. Copy Number Variation

Software

  • COKGEN: An R package for optimization-based identification of rare and de novo copy number variants from SNP microarray data.
Please stay tuned for POLYGON, an R package for identification of copy number polymorphisms from SNP microarray data. POLYGON is designed to work with any CNV identification algorithm to consolidate and improve their findings by drawing strength from multiple samples in the population.

Publications

People

Katie Wilkins

Undergraduate Student, Computer Science/Biochemistry

Matthew M. Ruffalo

Ph.D. Student, Electrical Engineering & Computer Science

Gokhan Yavas

Ph.D. Student, Electrical Engineering & Computer Science

Thomas LaFramboise

Assistant Professor, Genetics

Mehmet Koyuturk

Assistant Professor, Electrical Engineering & Computer Science

Acknowledgments

This project is supported by National Science Foundation Award IIS-0916102.