PageRank
Goal: determine which Web pages are "important"
Approach: ignore page contents; focus on hyperlinks
- treat Web as graph: page = vertex, hyperlink = directed edge
- pages with many incoming hyperlinks are important
- need to compute "incoming degree" for vertices
Problem: the Web is a very large graph
- approx. 1014 pages, 1015 hyperlinks
Assume for the moment that we could build a graph …
Most frequent operation in algorithm "Does edge (v,w) exist?"
|