Exploiting E-mail Structure to Improve Summarization
by Derek Lam, Steven L. Rohall, Chris Schmandt, Mia K. Stern
IBM Research,
2002
Language:
English
Abstract
This paper presents the design and implementation of a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains, and commonly-found features to generate summaries. The system uses existing software designed to summarize single-text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software gives a poor summary of e-mail messages. To remedy this poor performance, our system preprocesses e-mail messages using heuristics to remove e-mail signatures, header fields, and quoted text from parent messages. We also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, we discuss conclusions from a pilot user study of the summarization system and conclude with areas for further investigation.
