Operationalising "Websites"lexically, semantically or topologically?

  1. Aguillo Caño, Isidro F. 2
  2. Arroyo Vázquez, Natalia 1
  3. Cothey, Viv 3
  1. 1 Fundación Germán Sánchez Ruipérez
    info

    Fundación Germán Sánchez Ruipérez

    Madrid, España

  2. 2 Consejo Superior de Investigaciones Científicas
    info

    Consejo Superior de Investigaciones Científicas

    Madrid, España

    ROR https://ror.org/02gfc7t72

  3. 3 University of Wolverhampton
    info

    University of Wolverhampton

    Wolverhampton, Reino Unido

    ROR https://ror.org/01k2y1055

Revista:
Cybermetrics: International Journal of Scientometrics, Informetrics and Bibliometrics

ISSN: 1137-5019

Año de publicación: 2006

Título del ejemplar: What does the Web represent? From virtual ethnography to web indicators"

Número: 10

Tipo: Artículo

Otras publicaciones en: Cybermetrics: International Journal of Scientometrics, Informetrics and Bibliometrics

Resumen

Methods to investigate the structure of the Web graph in order to better understand its properties are of interest to many researchers. The scale and complexity of the Web-page digraph is typically managed by aggregating together or clustering individual Web-pages in order to form "Websites". It is the properties of these Websites which then become the focus of research. The most popular Web-page clustering technique is "lexical" and uses the url syntax in order to assign Web-pages to "Websites". Semantic clustering, that is clustering Web-pages according to the similarity of their content has also been proposed. In this paper we consider a third approach to Web-page clustering which is based on the topological properties of the Web-page within the Web-page digraph. We present the technique and report the results of an experiment to compare the use of url-lexically and topologically determined Websites in two sub-domains, one within the Spanish country level domain and the other within the UK country level domain of the Web.

Referencias bibliográficas

  • Ayan N. F., Li W. and Kolak O. (2002). Automating extraction of logical domains in a web site. Data & knowledge engineering, 43(2), 179-205.
  • Batagelj V. and Mrvar A:, (2003). Pajek - Analysis and Visualization of Large Networks. In Juenger, M., and Mutzel, P. (eds.), Graph drawing software. pp. 77-103, Berlin: Springer.
  • Bharat K., Chang B., Henzinger M. and Ruhl M., (2001). Who links to whom: mining linkage between Web sites. In Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 51-58.
  • Björneborn L., and Ingwersen P., (2001). Perspectives of webometrics. Scientometrics, 50(1), 65-82.
  • Cothey V., (2004). Web-crawling reliability. Journal of the American Society for Information Science and Technology, 55(14), 1228-1238.
  • Deo N. and Gupta P., (2001). World wide web: a graph-theoretic perspective. Technical report CS-TR-01-001, School of Computer Science, University of Central Florida.
  • Egghe L., (2000). New informetric aspects of the Internet: some reflections - many problems. Journal of information science, 26(5), 329-335.
  • Garfield E., (1999). Journal impact factor: a brief review. Canadian Medical Association Journal, 161(8), 979-980.
  • Ingwersen P. (1998). The calculation of Web impact factors. Journal of Documentation, 54(2), 236-243.
  • Leydesdorff L. (2004). Clusters and maps of science journals based on biconnected graphs in the Journal Citation Reports. Journal of documentation, 60(4), 371-427.
  • Menczer P., (2004). Lexical and semantic clustering by Web links. Journal of the American Society for Information Science and Technology, 55(14), 1261-1269.
  • Thelwall M., (2002). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university Web sites. Journal of the American Society for Information Science and Technology, 53(12), 995-1005.
  • Thelwall M., (2004). Methods for reporting on the targets of links from national systems of university Web sites. Information processing and management, 40(1), 125-144.
  • Watts D. J. and Strogatz S. H. (1998). Collective dynamics of small world networks. Nature, 393(6684), 440-442.