Due to an ever-growing abundance of online information, there is increasing difficulty in finding useful information. Currently, most web data is coded to instruct browsers to display data, without regard for meaning. By adding metadata to denote what data means, search engines can offer more useful results. While there are limitations with metadata approaches, user tagging appears to offer the greatest potential for improving online searching.
Metadata can be split into two main types: 1) tagging – the act of appending keywords to web resources (e.g. del.icio.us, Flickr, StumbleUpon), and 2) semantic web data – coding conventions to describe the data (e.g. RDF and microformats). Metadata can be added by, or in conjunction with, four groups: 1) creators, 2) programmers, 3) information specialists, 4) users. Metadata is useful for retrieving resources already found, (e.g. del.icio.us’ use of tagging), or for filtering of content to read pertinent areas. Yet, it is the use of metadata, specifically tagging by users, to aid searching that offers the most potential.
The leading hurdle for semantic web data is its highly technical nature, which limits use primarily to advanced web developers. Two XML/XHMTL solutions, RDF or microformats, are too advanced for many web developers, let alone lay web users. As a veteran web developer, I found the syntax of both to be intimidating. Until web-authoring software simplifies adding semantic data, it will remain a good idea left largely unimplemented.
There are essentially two limitations with the use of metadata by information specialists. For one, there is far too much web content to address. Yahoo, for example, abandoned a manual process as too time-consuming and costly. Secondly, while information specialists are good at classifying resources based on official schema or taxonomies, which work well in smaller environments such as an intranet, their work may not necessarily address the needs and lexicon of users in a wider community.
Tagging by content creators also faces the limitation that the creators do not fully know users’ needs and lexicon, even though they know the content well. In addition, tagging by content creators offers the opportunity for abuse – both intentional and unintentional. Unintentional abuse can derive from ignorance of standards, lack of accurate self-appraisal, or cultural or language differences. Abuse can also be intentional, as typified by spammers and phishers who use false metadata to lure people to their sites.
Tagging by content users offers the most potential for web searchers. Group tagging draws on the theory of the wisdom of crowds, wherein collective action by a diverse group results in better information than even specialists could provide. When a resource is tagged by a sufficient array of users, it overcomes minor discrepancies and represents what the resource actually means to most users. Users know the terms they associate with a resource and can use their own words to identify it; they are not apt to misidentify for selfish gain. Also, most tagging services are free, and they are quick and easy to use, compared to semantic web data.
Tagging services are not yet standard offerings in browsers and require time to learn, which may be why these services lack widespread adoption. However, as more and more users tag web resources, this situation will improve and will offer even more aid to web searchers.