Sunday, August 29, 2004

Block analysis of web documents

Usually, a web page is considered as a whole semantic unit in current link analysis techniques, such as HITS, PageRank. In portal pages in a website, there exist many contents about navigation, advertisement and different areas. Thus, the whole semantic unit assumption is too ideal to fit the real situation. Recently, some work divides each web page to several semantic block (or layout block, with the assumption that a certain area should belong the same semantic class), and analyze hyperlinks among blocks. There are three papers about block analysis in SIGIR 2004. In Block-level Link Analysis, authors state that dividing web pages to blocks can improve ranking accuracy remarkably. Its idea is quite simple, unlike classic HITS that find hub& authoritative nodes in a bipartite graph (each partite is the set of pages to be ranked), it build a bipratite graphs or block-page.

