img

Notice détaillée

Trawling the Web for emerging cyber-communities

Article Ecrit par: Kumar, R. ; Mendelzon, A. ; Raghavan, P. ; Rajagopalan, S. ; Tomkins, A. ;

Résumé: The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; out particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call out process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of out experiment.


Langue: Anglais