By Gerald Salton
Read or Download A Theory of indexing PDF
Best programming languages books
CMMI® for improvement (CMMI-DEV) describes top practices for the advance and upkeep of goods and prone throughout their lifecycle. through integrating crucial our bodies of information, CMMI-DEV presents a unmarried, finished framework for organisations to evaluate their improvement and upkeep approaches and increase functionality.
A brand new new release of speech-driven laptop structures grants to remodel the company use of data expertise. this isn't only a question of discarding the keyboard, yet of rethinking enterprise techniques to exploit the elevated productiveness that speech-driven platforms can convey.
"Computational experiments on algorithms can complement theoretical research by means of displaying what algorithms, implementations, and speed-up equipment paintings top for particular machines or difficulties. This ebook courses the reader in the course of the nuts and bolts of the key experimental questions: What may still I degree?
- A Short Course in Computational Science and Engineering: C++, Java and Octave Numerical Programming with Free Software Tools
- Visual C# 2005 easy
- A Practical Theory of Programming
- DSLs in Action
- Practical Model-Based Testing. A Tools Approach
- Professional BlackBerry
Extra info for A Theory of indexing
No simple answer can be given to question (a) above concerning the superiority of binary or term frequency weighting. The curly line in the b\ and /* columns of Table 9 designates the better precision values in each case. It may be seen that for the CRAN and MED collections, the binary weights are normally superior, whereas for the Time collection the term frequency weighting is preferable. However, the differences in performance are large only for the Time collection. This may be ascertained by consulting column 1 of Table 10 which contains statistical significance test results for certain pairs of weighting methods.
When the terms are arranged in increasing order according to their document frequencies in a collection, the first set of terms with very low document frequency Bk exhibits a discrimination value near zero. Next follow the terms with medium Bk and positive discrimination values; finally, the terms along the righthand edge of Fig. 11 exhibit the poorest discrimination values and the highest document frequencies. The document-frequency picture of Fig. 11 then suggests a model for the construction of good indexing vocabularies: the terms used for indexing purposes should as much as possible fall into the middle of the range of values represented in Fig.
A THEORY OF INDEXING 51 The document frequency cutoff actually used for deciding on inclusion of a given term in the experimental thesauruses was 19, 15, and 19 for the CRAN, MED, and Time collections respectively; that is, terms with document frequencies smaller than or equal to the stated frequencies were included. For the three test collections, the process creates 19, 60, and 26 thesaurus classes, respectively. The document frequency distributions of the rare terms included in the thesauruses and of the corresponding thesaurus classes are shown in Table 23.
A Theory of indexing by Gerald Salton