Faculty Candidate Seminar

Problems and algorithms for sequence analysis

Evimaria TerziIBM Almaden ResearcherUniversity of Helsinki

Dr. Terzi is from IBM Almaden Research Center
Sequential data appear in a wide range of diverse applications like telecommunications, stock-market analysis, bioinformatics, text processing, click-stream mining and many more. In this talk, I will show how such data motivate new problems and pose novel algorithmic challenges. I will focus on some of these challenges and address them through the lens of combinatorial optimization.

The central theme of my talk is sequence segmentation, i.e., the discovery of "homogeneous" segments from input sequences. In the first part of the talk, I will present a simple and efficient approximation algorithm for the k-segmentation problem on time-series data. In the second part of the talk, I will introduce the notion of segmental groupings and use it to provide comprehensive summaries for large event sequences. Using the Minimum Description Length principle, I will define the optimization problem of finding the best segmental grouping and then show that it can be solved optimally in polynomial time using a nested dynamic-programming algorithm. For the same problem, I will also present faster heuristic algorithms that perform very well in practice. Finally, I will conclude by showing how the results of data-analysis algorithms for sequential data can be used as input to new "meta-analysis" algorithms.

Evimaria Terzi has been a research scientist at IBM
Almaden Research Center since June 2007. She obtained her PhD from
the University of Helsinki (Finland) in January 2007, under the
supervision of prof. Heikki Mannila. Before that, she got her MSc
from Purdue University (USA) and her BSc from Aristotle University
of Thessaloniki (Greece). Her research focuses on algorithmic
aspects of data mining with applications in the analysis of
sequential and graph data, ranking and clustering.

Sponsored by

EECS - CSE Division