Internet

Internet and Knowledge Organization (KO)

The Internet has challenged traditional approaches to KO. An investigation such as Brophy & Bawden (2005) suggests that the Internet is already a serious challenger of the specialist research library.

When considering the differences we have to differentiate a) the stuff that exists on the Internet compared to what is represented in traditional libraries and databases and b) the methods and principles used to organize that stuff.

In general is a library or a database a careful selection of documents organized in ways that correspond to the content of the database as well as to its purpose and potential use. The Internet, on the other hand is a much more arbitrary collection in which much of the most valuable documents are not freely available due to copyright concerns. On the other hand is much stuff available which has not been edited or passed peer-review. It is also much more dependent on what single authors or organizations have chosen to make available on the Internet.

Most documents on the Internet are full text representations. Although traditional databases are also increasingly adding full-text representations is the norm still bibliographical databases with abstracts, descriptors and so on. This way the Internet has some of the same attributes as full-text databases have (e.g., a much higher recall compared to bibliographical databases).

""The explosive growth of Web search engines, with their primitive algorithms, has had some rather unfortunate effects, to my mind. Some of these engines appear to have been developed by people who saw a need, but who had not the vaguest idea that there was already a history of development of tools to fulfill similar needs. There is little evidence that some of these developers had ever used either Dialog or a library catalog. " (Milstead, 1998).

A thesaurus can become the basis of a more extensive semantic network, providing information not just on what terms are used in indexing, but on how they are used within the system. Most often a semantic network includes richer relationships than a thesaurus, but there is no reason not to build the less sophisticated system, using it as a resource when it becomes feasible to develop the more powerful system. " (Milstead, 1998).

We believe that it is wrong to reproach the developers of Internet search engines that they have not considers the theory of library classification. There is not doubt, in our minds, that the search engines are gigantic successes and that it is us that have to proof that traditional KOS have a role to play in the digital environment. In other words the search engines must be considered one approach to KO among others, and the relative benefits and drawbacks of different approaches have to be demonstrated scientifically, not by professional wishful thinking. " (Broughton et al., 2005).

In order to consider whether knowledge organization has a future or not, it is important to analyze as well the assumptions on which Internet engines (and related Information Retrieval technologies) are based and what the assumptions are in different approaches to KO. Should KOS be designed on the basis of experts, literatures, users, algorithms, standards, combinations of some of those things , or anything else?

There have been attempts to apply techniques and systems from the library tradition to the Internet.

Saeed & Chaudry (2001) is an example of research trying to apply DDC, a traditional library classification systems on the Internet.

Devadason et al. (2002) and Ellis & Vasconcelos (1999) are among a group of researchers who try to use the principles of facet analysis for organizing resources on the Internet.

Milstead, on the other hand, examines the use of thesauri and related semantic tools. She concludes:

"Given all the problems and limitations, how is it possible to remain positive about the need for continued use of thesauri? There are two fundamental reasons, one philosophical and one pragmatic:

Philosophically, just as thesauri built on subject heading lists, providing more structured relationships and terms better fitted to the current searching environment, thesauri can be built on to develop vocabulary tools which meet the needs of users in the search environment of the near future.
Pragmatically, there is increasing evidence of a realization on the part of text analysis system developers of the need to include a semantic component in their software. Whether this semantic component is a formal ANSI/NISO standard thesaurus is not as important as the fact that a rich semantic tool is embedded in the system.

There is no doubt, in my mind, that it is extremely useful to put the words of full-texts in databases and make every word searchable in any combination, with proximity operators and so on (i.e. Information retrieval as challenger to KO). This does not exclude, however, that KO also have important roles to play, that specific types of questions, for example, a badly served by pure IR-technologies.

One important difference between traditional online databases and online search engines is that in traditional databases has the professional searcher full control and the result is a well defined set of references/documents. On the Internet, on the other hand, are used secret search algorithms, and the search set is not well defined and easy to understand and modify.

Serious research and teaching of KO must be based on some kinds of vision, or should be given up. That vision should of course be reflected in the in the educational programs within LIS.

As shown above have different people different visions. Some have, for example, the vision of applying facet analysis on the Internet. It is important, however, that such visions are not just professional wishful thinking but are based on serious scientific studies.

The idea that I am using as a basis is that traditional IR-approaches have been atomist, mostly considering word frequencies in texts. The study of broader contexts such as disciplines, discourses, literatures, genres etc. correspond better to what knowledge good human indexers and searchers use, why such holistic concepts may prove themselves useful for a theory of KO which may supplement IR-techniques. The meaning of words are partly determined by their context. It is well known, that, for example "work" has different meanings in economy and in physics. When words are collected in databases, their context is lost, but may be reestablished, e.g. by proximity operators. The study of how contexts influence meaning and how meanings can be preserved during KO by considering the broader concepts mentioned above could be a way forward for KO. Besides, they fit better with a specific LIS approach compared with a computer-science approach: Fits, I believe, the expectations that many people have concerning qualifications of librarians.

When studying the performance of different kinds of IR-systems and KOS becomes approaches to evaluation important. In this connection it is worth considering that interpretative, qualitative approaches seem very neglected in information science.

Literature:

Brophy, J. & Bawden, D. (2005). Is Google enough? Comparison of an internet search engine with academic library resources. ASLIB Proceedings, 57(6), 498-512.

Broughton, V. (2002). Faceted classification as a basis for knowledge organization in a digital environment: the Bliss Bibliographic Classification as a model for vocabulary management and the creation of multidimensional knowledge structures. The New Review of Hypermedia and Multimedia, 7(1), 67-102.

Broughton, V.; Hansson, J.; Hjørland, B. & López-Huertas, M. J. (2005). Knowledge Organization. Report of working group. IN: LIS-education in Europe. Working seminar held in Copenhagen 11-12 August 2005 at the Royal School of Library and Information Science. (Click for submitted manuscript:1) word: LIS Education in Europe 2): htm: LIS Education in Europe.htm

Devadason, F. J.; Intaraksa, N.; Patamawongjariya, P. & Desai, K. (2002). Faceted indexing based system for organizing and accessing Internet resources. Knowledge Organization, 29(2), 65-77.

Ellis, D, & Vasconcelos, A. (1999). Ranganathan and the Net: using facet analysis to search and organise the World Wide Web. ASLIB Proceedings, 51(1), 3-10.

Finnemann, N. O. (2001). The Internet - A New Communicational Infrastructure. Manuscript for the 15^thNordicConference on Media and Communication Research, ^"New Media, Newopportunities, New Societies", University of Iceland in Reykjavik, Iceland, August 11th-13th, 2001. 43 pp. Available at: http://cfi.imv.au.dk/pub/skriftserie/002_finnemann.pdf

Fortunato, S.; Flammini, A.; Menczer, F. & Vespignani, A. (2005). The egalitarian effect of search engines. arXiv:cs.CY/0511005 v1. http://www.arxiv.org/PS_cache/cs/pdf/0511/0511005.pdf

Hahn, T. B. (1998). Text Retrieval Online: Historical Perspective on Web Search Engines. Bulletin of the American Society for Information Science, 24(4), http://www.asis.org/Bulletin/Apr-98/hahn.html

Introna, L. & H. Nissenbaum, H. (2000). Shaping the Web: Why the Politics of Search Engines Matters. The Information Society, 16(3), pp. 1-17 . Retrieved 2007-06-07 from: http://www.nyu.edu/projects/nissenbaum/papers/searchengines.pdf

Koch, T. (1997). The role of classification schemes in Internet resource description and discovery; work package 3 of Telematics for Research project DESIRE (RE 1004) www.ukoln.ac.uk/metadata/desire/classification

Milstead, J. L. (1998). Use of Thesauri in the Full-Text Environment. Based on a paper presented at the 34th Clinic on Library Applications of Data Processing. http://www.bayside-indexing.com/Milstead/useof.htm

Oppenheim, C., Morris, A., & McKnight, C. (2000). The evaluation of WWW search engines. Journal of Documentation, 56 (1), 71-90.

Saeed, H. & Chaudry, A. S. (2001). Potential of bibliographic tools to organize knowledge on the Internet: The use of Dewey Decimal Classification scheme for organizing Web-based information resources. Knowledge Organization, 28(1), 17-26.

Tang, M.-C. & Sun, Y. (2000). Evaluation of Web-Based Search Engines Using User-Effort Measures. Libres: Library and Information Science Research Electronic Journal [online], 19(2)URL:http://libres.curtin.edu.au/libres13n2/tang.htm

Birger Hjørland

Last updated: 08-06-2007

HOME