Michael,
induging the paranoia button
> US5937422: Automatically generating a topic description for text and
> searching and sorting text by topic using the same
> http://www.patents.ibm.com/details?&pn=US05937422__
> Inventor(s): Nelson; Douglas J. , Columbia, MD
> Schone; Patrick John , Elkridge, MD
> Bates; Richard Michael , Greenbelt, MD
>
> Applicant(s): The United States of America as represented by the National
> Security Agency, Washington, DC
> News, Profiles, Stocks and More about this company
>
> Issued/Filed Dates: Aug. 10, 1999 / April 15, 1997
>
> Application Number: US1997000834263
>
> IPC Class: G06F 017/30;
>
> Class: 707/531; 707/004; 707/532; 707/535; 707/512;
>
> Field of Search: 704/010 707/512,532,535,531,3-5,7
>
> Abstract: A method of automatically generating a topical description of
text
> by receiving the text containing input words; stemming each input word to
> its root form; assigning a user-definable part-of-speech score to each
input
> word; assigning a language salience score to each input word; assigning an
> input-word score to each input word; creating a tree structure under each
> input word, where each tree structure contains the definition of the
> corresponding input word; assigning a definition-word score to each
> definition word; collapsing each tree structure to a corresponding
tree-word
> list; assigning a tree-word-list score to each entry in each tree-word
list;
> combining the tree-word lists into a final word list; assigning each word
in
> the final word list a final-word-list score; and choosing the top N
scoring
> words in the final word list as the topic description of the input text.
> Document searching and sorting may be accomplished by performing the
method
> described above on each document in a database and then comparing the
> similarity of the resulting topical descriptions.
>
> Attorney, Agent, or Firm: Morelli; Robert D.;
>
> Primary/Assistant Examiners: Amsbury; Wayne; Channavajjala; Srirama
>
> U.S. References: (No patents reference this one) Patent Issued
> Inventor(s) Title
> US4965763 10 /1990 Zamora Computer method for automatic extraction of
> commonly specified information from business correspondence
> US5371673 12 /1994 Fan Information processing analysis system for sorting
> and scoring text
> US5384703 1 /1995 Withgott et al. Method and apparatus for summarizing
> documents according to theme
> US5434962 7 /1995 Kyojima et al. Method and system for automatically
> generating logical structures of electronic documents
> US5619410 4 /1997 Emori et al. Keyword extraction apparatus for Japanese
> texts
> US5845278 12 /1998 Kirsch et al. Method for automatically selecting
> collections to search in full text searches
> US5873660 2 /1999 Walsh et al. Morphological search and replace
>
> First Claim: Show all 31 claims
> What is claimed is:
> 1. A method of automatically generating a topical description of text,
> comprising the steps of:
> a) receiving the text, where the text consists of one or more input words;
> b) stemming each input word to its root form;
> c) assigning a user-definable part-of-speech score ßi to each input word;
> d) assigning a language salience score Si to each input word;
> e) assigning an input-word score to each input word that is a function of
> the corresponding input word's part-of-speech score ßi, language salience
> score Si, and the number of times the corresponding input word appears in
> the text;
> f) creating a tree structure under each input word, where each tree
> structure contains the definition of the corresponding input word, where
> each definition word may be further defined to a user-definable number of
> levels;
> g) assigning a definition-word score Ai,t [j] to each definition word in
> each tree structure based on the definition word's part-of-speech score
ßj,
> the language salience score of the word the definition word defines, a
> relational salience score Rk,j, and a user-definable factor W;
> h) collapsing each tree structure to a corresponding tree-word list, where
> each tree-word list contains the unique words contained in the
corresponding
> tree structure;
> i) assigning a tree-word-list score to each word in each tree-word list,
> where each tree-word-list score is a function of the scores of the
> corresponding word that existed in the corresponding uncollapsed tree
> structure;
> j) combining the tree-word lists into a final word list, where the final
> word list contains the unique words contained in the tree-word lists;
> k) assigning a final-word-list score Afi [j] to each word in the final
word
> list, where Afi [j] is a function of the corresponding word's dictionary
> salience and tree-word-list scores; and
> l) choosing the top N scoring words in the final word list as the topic
> description of the input text, where the value N may be defined by the
user.
>