PageRank
Goal: determine which Web pages are "important"
Approach: ignore page contents; focus on hyperlinks
- treat Web as graph: page = vertex, hyperlink = di-edge
- pages with many incoming hyperlinks are important
- need to computing "incoming degree" for vertices
Problem: the Web is a very large directed graph
- approx. 1011 pages, 1013 hyperlinks
Assume for the moment that we could build a graph ...
Most frequent operation in algorithm "Does edge (v,w) exist?"
|