All Things Email

About | Contact

Exploiting E-mail Structure to Improve Summarization

by Derek Lam, Steven L. Rohall, Chris Schmandt, Mia K. Stern

IBM Research, 2002
Language: English

External links

Full text: PDF

Information about this paper

Abstract

This paper presents the design and implementation of a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains, and commonly-found features to generate summaries. The system uses existing software designed to summarize single-text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software gives a poor summary of e-mail messages. To remedy this poor performance, our system preprocesses e-mail messages using heuristics to remove e-mail signatures, header fields, and quoted text from parent messages. We also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, we discuss conclusions from a pilot user study of the summarization system and conclude with areas for further investigation.

Creative Commons. Some Rights Reserved.
Copyright © 2004 Jochen Topf
Unless otherwise noted the contents on this site are licensed under the
Creative Commons Attribution-ShareAlike License.