First page Back Continue Last page Graphics
Automatic Text Retrieval Systems Design Questions
What system of term weighting will be able to distinguish content value the most effectively?
- Term frequency (tf)– A measure of the frequency with which a term appears in a document or query.
- Inverse document frequency (idf)– A collection-dependent measure that varies inversely with the number of documents to which a term is assigned.
- Term discrimination – Term frequency multiplied by inverse document frequency, or tf x idf.
- Normalization factor – A formula which normalizes the term-weighting so that shorter documents, with potentially fewer document descriptors, are not more difficult to find than longer documents, with potentially more descriptors.