Friday, July 13, 2007

Caught Up In the Semantic Web

I, Glen Farrelly, went for the second time to the Greater Toronto Web Centric Meetup Group Wednesday evening (the organization brings out a diverse, interesting group of people working in the various components of the Web).

I met a colleague there, Richard, and we got to talking about how already the term Web. 3.0, such a groaner term that it is, is becoming more popular.

Web 3.0 is often used to describe the Semantic Web, which has been around since Tim Berners-Lee started the ball rolling on it in 1999. I'd heard and was excited by the Semantic Web a few years ago, but since it was a theoretically concept, it didn't seem to get a lot of popular attention. Until the last few months it seems.

Recently, I'd read an article in Business 2.0 (Business 2.0 is still one of the best magazines covering the Internet topics) by Michael V. Copeland called "Weaving the [Semantic] Web". Since then I found a great, but very long article on the topic by John Borland called A Smarter Web in MIT's Technology Review. Richard also forwarded me a recent conversation with Berners-Lee on the issue in ITWorld Canada.

I don't claim to understand all the complicated science that would enable the Semantic Web, but I do get the need and the possible advantages it promises. Copeland describes the current limations of webpages and existing search technology:

Services like Google do a great job of sifting through all those webpages, but it's up to people to recognize the things they want when they see them in the results... The Web just isn't very smart yet; one webpage is the same as any other. It might have a higher Google ranking, but there's no distinction based on meaning. The semantic Web in the Berners-Lee vision acts more like a series of connected databases, where all information resides in a structured form. Within that structure is a layer of description that adds meaning that the computer can understand.

Borland expands on how the semantic web would work and the benefits:

[it] would provide a way to classify individual bits of online data such as pictures, text, or database entries but would define relationships between classification categories as well. Dictionaries and thesauruses called "ontologies" would translate between different ways of describing the same types of data, such as "post code" and "zip code." All this would help computers start to interpret Web content more efficiently. In this vision, the Web would take on aspects of a database, or a web of databases. Databases are good at providing simple answers to queries because their software understands the context of each entry. "One Main Street" is understood as an address, not just random text. Defining the context of online data just as clearly--labeling a cat as an animal, and a veterinarian as an animal doctor, for example--could result in a Web that computers could browse and understand much as humans do...

With computers able to read webpages more effectively, they'll be able to automate things for us such as finding the cheapest price on something or organizing an evening out with friends.

But if it only results in a simple query on a search engine not returning a gazilon results and make me scour for pages to find good information, then the Semantic Web is a winner to me.


Stephen Fetter said...

Hmmm ... interesting concept

I have two immediate reactions:

(1) This is likely to make the creation of webpages that are readable by this sort of software much more complicated to produce and maintain -- thus moving the whole process away from small-time retail outlets and amateurs. This may be good for pros like you, but one of the things I really like about the Web is its egalitarian nature -- the way that even really small outlets (Mom & Pop stores, little churches, etc.) can have a reasonable web-presence. I think this would likely be lost in the Web 3 scenario.

(2) To some extent this kind of idea drives the search engines that offer to sort news articles according to your own personal tastes. There's a huge debate among journalists and editors about the value and limitations of that sort of presentation ... and the need still for human editors. It sort of sounds like the Web 3 you're talking about builds on this idea ... and comes with all the limitations that Google News currently still has

Glen Farrelly said...

Interesting points you raised!

Regarding your first point, I think webpages could become much more structured as a result. Webpages now are essentially just text (and graphics) with instructions on how that text should look (and behave). While adding some sort of structure would make webpages much more readable to machines, they could make them harder to create - though software would no doubt arise to help us out (eg. there has been good XML editors out there for awhile, something similar could work for this) In all likelihood though it won't be all text on a page that would need this structure just certain fields on a page eg. prices, event info, names, keywords, etc. - the type of info we'd be most apt to search for.

Any solution like this would probably not be forced on website creators, but would take years and years to phase in on an optional basis.

Due to these and other limitations, I believe that artificial intelligence type solutions that are currently being pursued will be more likely to deliver.

Regarding your second point, it is a bit frightening when robots start choosing the info for us that is relevant and thus seen, whether news or otherwise (visions of The Matrix). But then newscasters have been picking the news for us on TV and they somehow feel endless coverage of Anna Nicole, Paris Hilton, Britney Spears, is most relevant. Considering this, I'll take my chances with the robots!