A Theory of indexing by Gerald Salton PDF

By Gerald Salton

Show description

Read or Download A Theory of indexing PDF

Best programming languages books

New PDF release: CMMI - Guidelines for Process Integration and Product

CMMI® for improvement (CMMI-DEV) describes top practices for the advance and upkeep of goods and prone throughout their lifecycle. through integrating crucial our bodies of information, CMMI-DEV presents a unmarried, finished framework for organisations to evaluate their improvement and upkeep approaches and increase functionality.

Get Executive Guide to Speech-Driven Computer Systems PDF

A brand new new release of speech-driven laptop structures grants to remodel the company use of data expertise. this isn't only a question of discarding the keyboard, yet of rethinking enterprise techniques to exploit the elevated productiveness that speech-driven platforms can convey.

New PDF release: A guide to experimental algorithmics

"Computational experiments on algorithms can complement theoretical research by means of displaying what algorithms, implementations, and speed-up equipment paintings top for particular machines or difficulties. This ebook courses the reader in the course of the nuts and bolts of the key experimental questions: What may still I degree?

Extra info for A Theory of indexing

Sample text

No simple answer can be given to question (a) above concerning the superiority of binary or term frequency weighting. The curly line in the b\ and /* columns of Table 9 designates the better precision values in each case. It may be seen that for the CRAN and MED collections, the binary weights are normally superior, whereas for the Time collection the term frequency weighting is preferable. However, the differences in performance are large only for the Time collection. This may be ascertained by consulting column 1 of Table 10 which contains statistical significance test results for certain pairs of weighting methods.

When the terms are arranged in increasing order according to their document frequencies in a collection, the first set of terms with very low document frequency Bk exhibits a discrimination value near zero. Next follow the terms with medium Bk and positive discrimination values; finally, the terms along the righthand edge of Fig. 11 exhibit the poorest discrimination values and the highest document frequencies. The document-frequency picture of Fig. 11 then suggests a model for the construction of good indexing vocabularies: the terms used for indexing purposes should as much as possible fall into the middle of the range of values represented in Fig.

A THEORY OF INDEXING 51 The document frequency cutoff actually used for deciding on inclusion of a given term in the experimental thesauruses was 19, 15, and 19 for the CRAN, MED, and Time collections respectively; that is, terms with document frequencies smaller than or equal to the stated frequencies were included. For the three test collections, the process creates 19, 60, and 26 thesaurus classes, respectively. The document frequency distributions of the rare terms included in the thesauruses and of the corresponding thesaurus classes are shown in Table 23.

Download PDF sample

A Theory of indexing by Gerald Salton


by Jeff
4.3

Rated 4.18 of 5 – based on 21 votes