Wireless & Networking Workshop

November 8th, CWRU's Peter B. Lewis Building, 9:00am-1:00pm

Peter B. Lewis Building Rooms 201, 258, 259 and 358.

Lossless Data Compression for Biomolecular Sequences and bioxml documents

Prof. Cenk Sahinalp

Department of EECS, Case School of Engineering

We have developed a new algorithm designed for DNA/RNA sequences which improves upon the standard “biocompress” utility by UCSB. Our goal is to beat biocompress both in terms of compression rate and performance. Known techniques limit the number of mismatches to one or two in order to obtain reasonable performance figures. We improve the available search routines via a new algorithm for finding approximate matches for the longest uncompressed prefix of the input sequence. Our immediate goal is to achieve the best implementation of the above algorithm and develop generalizations of it for bioXML which incorporates annotations and other forms of text.

Created: 2002-10-20. Last Modified: 2002-11-5.