page 1  (22 pages)
2to next section

Natural language processing

for

information retrieval

David D. Lewis

AT&T Bell Laboratories

Karen Sparck Jones

Computer Laboratory, University of Cambridge

July 1993

1 Abstract

The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing.

This paper will appear in Communications of the ACM.