Internet and Knowledge Organization (KO)

The Internet has challenged traditional approaches to KO. An investigation such as Brophy & Bawden (2005) suggests that the Internet is already a serious challenger of the specialist research library.


When considering the differences we have to differentiate a) the stuff that exists on the Internet compared to what is represented in traditional libraries and databases and b) the methods and principles used to organize that stuff.


In general is a library or a database a careful selection of documents organized in ways that correspond to the content of the database as well as to its purpose and potential use. The Internet, on the other hand is a much more arbitrary collection in which much of the most valuable documents are not freely available due to copyright concerns. On the other hand is much stuff available which has not been edited or passed peer-review. It is also much more dependent on what single authors or organizations have chosen to make available on the Internet.


Most documents on the Internet are full text representations. Although traditional databases are also increasingly adding full-text representations is the norm still bibliographical databases with abstracts, descriptors and so on. This way the Internet has some of the same attributes as full-text databases have (e.g., a much higher recall compared to bibliographical databases).

""The explosive growth of Web search engines, with their primitive algorithms, has had some rather unfortunate effects, to my mind. Some of these engines appear to have been developed by people who saw a need, but who had not the vaguest idea that there was already a history of development of tools to fulfill similar needs. There is little evidence that some of these developers had ever used either Dialog or a library catalog. " (Milstead, 1998).

A thesaurus can become the basis of a more extensive semantic network, providing information not just on what terms are used in indexing, but on how they are used within the system. Most often a semantic network includes richer relationships than a thesaurus, but there is no reason not to build the less sophisticated system, using it as a resource when it becomes feasible to develop the more powerful system. " (Milstead, 1998).

We believe that it is wrong to reproach the developers of Internet search engines that they have not considers the theory of library classification. There is not doubt, in our minds, that the search engines are gigantic successes and that it is us that have to proof that traditional KOS have a role to play in the digital environment. In other words the search engines must be considered one approach to KO among others, and the relative benefits and drawbacks of different approaches have to be demonstrated scientifically, not by professional wishful thinking. " (Broughton et al., 2005).

In order to consider whether knowledge organization has a future or not, it is important to analyze as well the assumptions on which Internet engines (and related Information Retrieval technologies) are based and what the assumptions are in different approaches to KO. Should KOS be designed on the basis of experts, literatures, users, algorithms, standards, combinations of some of those things , or anything else?

There have been attempts to apply techniques and systems from the library tradition to the Internet.

Saeed & Chaudry (2001) is an example of research trying to apply DDC, a traditional library classification systems on the Internet. 

Devadason et al. (2002) and Ellis & Vasconcelos (1999) are among a group of researchers who try to use the principles of facet analysis for organizing resources on the Internet.

Milstead, on the other hand, examines the use of thesauri and related semantic tools. She concludes:

"Given all the problems and limitations, how is it possible to remain positive about the need for continued use of thesauri? There are two fundamental reasons, one philosophical and one pragmatic:


There is no doubt, in my mind, that it is extremely useful to put the words of full-texts in databases and make every word searchable in any combination, with proximity operators and so on (i.e.  Information retrieval as challenger to KO). This does not exclude, however, that KO also have important roles to play, that specific types of questions, for example, a badly served by pure IR-technologies.


One important difference between traditional online databases and online search engines is that in traditional databases has the professional searcher full control and the result is a well defined set of references/documents. On the Internet, on the other hand, are used secret search algorithms, and the search set is not well defined and easy to understand and modify.


Serious research and teaching of KO must be based on some kinds of vision, or should be given up. That vision should of course be reflected in the in the educational programs within LIS.


As shown above have different people different visions. Some have, for example, the vision of applying facet analysis on the Internet. It is important, however, that such visions are not just professional wishful thinking but are based on serious scientific studies.


The idea that I am using as a basis is that traditional IR-approaches have been atomist, mostly considering word frequencies in texts. The study of broader contexts such as disciplines, discourses, literatures, genres etc. correspond better to what knowledge good human indexers and searchers use, why such holistic concepts may prove themselves useful for a theory of KO which may supplement IR-techniques. The meaning of words are partly determined by their context. It is well known, that, for example "work" has different meanings in economy and in physics. When words are collected in databases, their context is lost, but may be reestablished, e.g. by proximity operators. The study of how contexts influence meaning and how meanings can be preserved during KO by considering the broader concepts mentioned above could be a way forward for KO. Besides, they fit better with a specific LIS approach compared with a computer-science approach: Fits, I believe, the expectations that many people have concerning qualifications of librarians.


When studying the performance of different kinds of IR-systems and KOS becomes approaches to evaluation important. In this connection it is worth considering that interpretative, qualitative approaches seem very neglected in information science.






Birger Hjørland

Last updated: 08-06-2007