I read this article last

I read this article last night and did not think much of it until later when I checked my referers through Radio. I noticed that several things I have written (some recent and some older) had been picked up by Google. Unfortunately, the URL linked to my main page, not the article being referenced and upon arrival there is no easy way for someone coming into the average blog to then track down the article or story. While things like trackbacks and permalinks can be used by blogs, the bots from the search engines are not able to distinguish them yet. Hopefully soon…

Google loves blogs. Blogs loves Google. But is there trouble in paradise? When items slip of the front page of most blogs, there is an anecdotal two- to three-week delay before archived items are reindexed. As Dylan Tweney points out this is an artifact of the fact that Google’s basic unit of indexing is the web page URL and blogs are more fine-grained: the post as the basic unit, usually multiple posts on a single page.

Permalinks arose to address this same issue, allowing post-level targetting of links to web posts. This is generally implemented with named anchors within pages, although it’s also possible to assign each entry its own page in the archives, even if several entries are aggregated at any one time on the blog’s index page.

Dylan has a suggestion, though, to help the Googlesphere catch up with the blogosphere:

As it turns out, we do have a couple of data formats that understand the difference between a post and a page, include useful summary data, and even include handy pointers back to the exact archive location of a post. They’re called RSS and RDF.

These syndication formats are used to aggregate news, but they could be useful indexing tools too. What if Google (or Daypop, once they can afford to buy a few new hard drives) collected RSS and RDF feeds — and then archived them in a searchable index?

Instead of news stories scrolling off into oblivion when they get to the bottom of a feed, they’d enter a permanent index where they could be used for information retrieval later.

It seems that the same approach would work when indexing an intranet or enterprise portal. Maybe part of the solution for turning k-logs into a true knowledge sharing system is to make sure the search implementation indexes RSS feeds from k-logs, making knowledge retrieval possible without discontinuities.

Share this:

Leave a Reply Cancel reply