Web Search & Text Retrieval (CRN:91672)
Concept lists
- Week 1:
- Information Retrieval;
Unstructured;
Structured;
semi-structured;
Information need.
- Term;
Token;
Document;
Collection;
Corpus.
-
Dictionary;
Vocabulary;
Lexicon.
-
Incidence matrix.
-
Inverted index;
Postings;
Postings list.
-
Query;
Boolean retrieval model;
Merge algorithm.
-
Ad hoc retrieval;
Ranked query;
Relevance;
Effectiveness;
Precision;
Recall.
- Week 2:
- Stop words, stop list;
- token normalization, equivalence classes;
- case folding, truecasing;
- stemming, porter stemmer;
- lemmatization, lemma, lemmatizer;
- skip list;
- phrase queries, biword index, phrase index;
- positional index.
- binary tree, B-tree;
- wildcard query, permuterm index, k-gram index;
- edit distance, Levenshtern distance, dynamic programm, k-gram index filtering, Jaccard coefficient;
- context sensitive spelling correction;
- Week 3:
- External sorting, blocked sort-based indexing (BSBI) algorithm, single-pass in-memory indexing;
- MapReduce, key-value pairs;
- Dynamic indexing, auxiliary index, logarithmic merging;
- Rule of 30;
- Lossless compression, lossy compression;
- Heaps' law;
- Zipf's law, power law;
- Dictionary-as-a-string, blocked storage, front coding;
- Variable byte encoding, continuation bit;
- gamma codes, optimal gap encoding;
- This is a growing list.