Home | Research | Publications | Software | Teaching | Members | News/Awards | Resource | Positions | Acknowledgements |


A Note:

Thanks for reading the detailed descriptions of our projects. If you want to collaborate on these and other emerging topics, or need help on data analyses using our software tools, please send me an email at FirstNameLastName@Case.edu



1  Research interests and accomplishments

My research interests are in the areas of bioinformatics and computational biology, statistical genomics, personal genomics and functional genomics, with a particular emphasis on the design and application of efficient combinatorial and statistical algorithms to challenging biological problems, and on the development of user-friendly software tools. More specifically, I have been mainly working on problems arising from analysis of human genomic variations, including haplotype inference from family data and family-based association mapping, disease association mapping based on haplotype similarities, SNP (single nucleotide polymorphism) subset selection within multi-stage designs, copy number variation detection, investigation of epistatic effects based association studies, and management and visualization of genome-wide association results. Together with colleagues and students, we have also investigated computational approaches for gene co-expression data analysis, identification of gene expression patterns across cancer stages, computational analysis of non-coding RNAs, and evolutionary analysis of biological networks. Significant progress has been made in many of these areas with great impacts, including publications in prominent journals, leading international conferences, book chapters, numerous invited talks and conference/workshop presentations, as well as software tools.

2  Research Projects

2.1  Haplotype inference

Haplotypes, which reflect the correlation structure of heritable variations, hold the key to our understanding of "disease genes" for many complex diseases. However, haplotype data are not collected directly. Efficient and accurate computational methods for the reconstruction of haplotypes from genotype data are in great need. We mainly focus on the problem of haplotype reconstruction from family data, and have been working on a combinatorial formulation that aims to minimize the total number of recombination events (the MRHC problem). Significant advances have been made in understanding the complexity of the problem, and in developing efficient algorithms and software tools (PedPhase) to solve the problem by our group [1,2,3,4]. Currently, we are working on algorithms that can process whole genome wide SNP data.

2.2  Disease gene mapping based on haplotype similarities

Haplotype-based association mapping approaches usually provide higher power in identifying genes underlying complex diseases. However, the large number of distinct haplotypes may compromise the power of haplotype-based association methods because of high degrees of freedom. We have developed an algorithmic method for haplotype mapping using density-based clustering and proposed a new haplotype similarity measure [5]. The mapping regards haplotype segments as data points in a high dimensional space. The disease susceptibility gene embedded haplotype segments, especially those mutants of recent origin, tend to be close to each other due to linkage disequilibrium, while other haplotype segments can be regarded as random noise sampled from the haplotype space. The algorithm is efficient and robust, and it does not require any assumptions about the evolutionary model or the inheritance patterns of the disease. It can also deal with high level of phenocopies. The approach was later extended to quantitative traits  [6] and was implemented as a software tool called HapMiner. Our recent experiments also show that the clustering can enhance the power of the score test to detect association [7]. Currently we investigate extensions of the algorithm to family data.

2.3  Structure variation detection

Structure variations such as copy-number alterations may result in genomic disorders and somatic CNVs play an important role in cancers. With remarkable capacity from current technologies in assessing CNVs, the research community has shown great interests in investigating inheritable as well as somatic CNVs recently. We have developed two approaches to identify CNVs based on array comparative genomic hybridization (aCGH) data [8,9]. Currently, we are working on efficient algorithms for structure variation detection based on high throughput sequencing technologies.

2.4  Management and visualization of genome-wide association results

Large-scale genome-wide association studies are increasingly common. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient tools to manage, analyze, visualize, share and integrate such data. Recently, we develop a web application tool named MAVEN, for Management, Analysis, Visualization and rEsults shariNg of GWA data using cutting edge technologies.

2.5  Gene-gene interactions

It is well-known that gene-gene interactions may play an important role in the etiology of complex diseases. We developed an efficient strategy based on two-stage analysis [10]. Currently, we are investigating new approaches for tackling this problem using machine learning approaches. As a byproduct, we made available of the program gs that can generate simulated data for various interaction models [11].

2.6  Other projects and collaborations

We also have a few other projects that in the development stages, which include SNP selection in multi-stage designs [12], biological network analysis, disease gene identification based on systems biology, gene co-expression data analysis, gene signature identification from cancer progression data and computational analysis of non-coding RNAs. We extremely welcome collaborations on these and other emerging topics, as well as real data analyses. Send me an email at FirstNameLastName@Case.edu.

References

[1]
Li, J. & Jiang, T. Efficient inference of haplotypes from genotypes on a pedigree. J Bioinform Comput Biol 1, 41-69 (2003).
[2]
Doi, K., Li, J. & Jiang, T. Minimum recombinant haplotype configuration on tree pedigrees. In Algorithms in bioinformatics, Proceedings of the third Annual Workshop on Algorithms in Bioinformatics, 339-353 (Springer, Budapest, Hungary, 2003).
[3]
Li, J. & Jiang, T. Computing the minimum recombinant haplotype configuration from incomplete genotype data on a pedigree by integer linear programming. J Comput Biol 12, 719-39 (2005).
[4]
Li, X. & Li, J. Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. In Proceedings of the seventh annual international conference on computational systems bioinformatics, 297-310 (World Scientific, Palo Alto, CA, USA, 2008).
[5]
Li, J. & Jiang, T. Haplotype-based linkage disequilibrium mapping via direct data mining. Bioinformatics 21, 4384-93 (2005).
[6]
Li, J., Zhou, Y. & Elston, R. C. Haplotype-based quantitative trait mapping using a clustering algorithm. BMC Bioinformatics 7, 258 (2006).
[7]
Igo, J., R. P., Li, J. & Goddard, K. A. Association mapping by generalized linear regression with density-based haplotype clustering. Genetic Epidemiology 32, 1-11 (2008).
[8]
Hayes, M. & Li, J. A linear-time algorithm for analyzing array cgh data using log ratio triangulation. In Proceedings of the fifth annual International Symposium on Bioinformatics Research and Applications (ISBRA), Lecture Notes in Bioinformatics, 248-259 (Springer, Ft. Lauderdale, FL, USA, 2009).
[9]
Yin, X. L. & Li, J. A general graphical framework for detecting copy number variation. In Proceedings of the eighth annual international conference on computational systems bioinformatics, xxx-xxx (World Scientific, Palo Alto, CA, USA, 2009).
[10]
Li, J. A novel strategy for detecting multiple loci in genome-wide association studies of complex diseases. Int J Bioinform Res Appl 4, 150-63 (2008).
[11]
Li, J. & Chen, Y. Generating samples for association studies based on hapmap data. BMC Bioinformatics 9, 44 (2008).
[12]
Li, J. Prioritize and select SNPs for association studies with multi-stage designs. J Comput Biol 15, 241-57 (2008).

Updated 07/28/2009.