Personal tools
You are here: Home Articles Jan 2003 Distributing the Zope knowledgebase
Navigation
 
Document Actions

Distributing the Zope knowledgebase

by paul last modified 2006-01-20 19:06

There are many good Zope community sites providing useful content. This document proposes a system to stitch them together into one searchable, navigable knowledgebase.

Zope is a great platform for building content management systems, and the Zope community is the jewel in the crown, giving help and contributing content and software.

Right now, though, the Zope content universe is self-contained in zope.org. There isn't much leveraging of the other good sites in the Zope universe. This also means that, in some respects, the world of Zope content is only as good as Zope.org.

This presents us all with a challenge. For instance, it puts pressure on Zope.org to handle all needs, when in fact it is getting narrower in scope (e.g. nzo will not allow DTML or ZPT for members).

This document proposes a system for de-centralizing content authoring for the world of Zope, while increasing the sharing of content between Zope community sites. The goal: better serve the Zope community by:

  • Aggregating useful information
  • Allowing innovation (the bazaar) through new sites while maintaining the official appoach (the cathedral)
  • Eliminate single points of failure and bottlenecks

Proposal

The world of blogs has shown that systems can be decentralized while still working together. As mentioned George Donnelly put it on the zope-web mailing list:

    this is a fascinating idea. maybe we could look at some examples
    from the weblog community as was mentioned, e.g, weblogs.com and
    their list of recently updated blogs and their free hosting of
    blogs, the trackback/backlinking feature of movabletype (
    http://movabletype.org/trackback/ ), metafilter and slashdot,
    weblog search engines and indices.

I propose this kind of approach, one that uses existing patterns and accumulated wisdom. Jeffrey, a prominent Zope blogger and current zope.org webmaster, suggested:

I think it would be interesting to bring in RSS feeds from other Zope community sites. Something even more interesting would be to read in RSS feeds and catalog them, so that you could search for help on Zope.org and get a ZopeLabs recipe or "Ask ZopeZen" discussion.

George added:

how about this: a central site that links to the zope site universe, has an integrated xml-based listing of all recent zope content that is out there (weblog-ish look, or like the xml news feed thing in radio userland), the ability to set up your zope space/weblog/etc (like zope.org/Members but with more access to neat zope stuff) and some integration with google...

I propose an RSS aggregator that pulls metadata listings from various Zope sites. The format will be RSS 1.0, allowing us to extend the metadata scheme if we want. (RSS 2.0 is ok as well, but this is a detail better left to Jeffrey :^) )

To avoid increasing the sysadmin burden on zope.org, this aggregator will run somewhere out in the world. Once every N minutes, it will wake up and:

  • Start a logfile that is retrievable from a well-known URL
  • Retrieve the RSS listing from each site
  • Apply post-processing to the links
  • Write warnings to the logfile
  • On rejections, email the owner a URL to the error log
  • Save the linkbase data in a file with a name keyed for that server (or perhaps
  • Check it into CVS
  • Using XML-RPC, update the Link objects in the appropriate folder on CZO (current zope.org)

Constraints

  • No login access on czo, so no external methods or new products. As best we can, we need to move the work outside zope.org and leverage already-available content objects.
  • Needs to work with czo. There isn't a need to wait for nzo. If we can add value and start the external harvester now, we should.
  • No migration issues for nzo. Of course we should also make sure this work can fit in nzo. There should be no content migration issues and we shouldn't have to develop new software for nzo.
  • Don't impact nzo schedule. The nzo pope has said, rightfully, that no new work can be assigned to existing nzo people. We need to get an alpha up for nzo.
  • Logically separated from the "cathedral" content. I think there's a loose consensus amongst ZC folks, the zope-web folks, and the community that "official" docs and content are different than community content.
  • None of us have more than a manweek of time to program this. We should do our best to avoid writing specifications for the space shuttle.

Challenges

  • How to handle changes. RSS isn't the best for indicating that something has changed.
  • How to handle deletes. RSS, AFAIK, has no ability to indicate that a resource was removed. Further, most Zope sites/CMSs have no such ability either. To mitigate this challenge, I propose that at first we just ignore it. Nobody sends google a message when a page disappears. Later we can investigate some script to check existance.
  • Expressing date searches. RSS is geared towards telling you the N most recent resources. If N is 10 and there are 12 changes since the crawler looked, you'll lose 2 changes. We need some way to say, "I last checked you yesterday, what's changed since then?"
  • Long URLs. How do we organize the linkbase resources on czo? I'd hate to see zope.org/linkbase/www.zopezen.org/somechannel/2003/01/somereallylongid. Even if that is shorter than Zope 3 Wiki urls. :^)
  • Where to send (and how to handle) errors. Systems like this encounter excpetional cases. If time isn't invested in speeding up the processing of exceptions, the system will become tedious to use.
  • Categories. Of course the linkbase would serve more use if people could browse it by topic. And of course, creating such a topic hierarchy is a jihad that has never been successful on zope.org. I vote the Jeffrey arbitrarily decide for nzo and shove it down our throats.
  • Optimizations. To scale, and to avoid ZODB bloat, we might be tempted to go down the road of optimization. The challenge, though, is that this defeats simplicity and imposes a burden all around -- on the Zope sites providing content, on the linkbases presenting them, and on the person writing the script.
  • Terms of content licenses on various sites. Would we need to get agreement from the sites to release content under some license? I think we can stay silent on this, as has czo and nzo.
  • Is metadata from RSS enough for meaningful czo searches, or will we ultimately want to retrieve/syndicate the body?

Benefits

  • All the good work done in the community is better leveraged.
  • We can experiment with fresh ideas, without adding risks to the NZO schedule or process.
  • The linkbase can be "presented" not just by the zope.org site, but by many sites, thus eliminating a single point of failure and improving response time by moving retrievals closer to global users.
  • By checking the content into a CVS repository in a neutral format, we get a safe historical archive of metadata. We also decouple the content from the content application, allowing other people to do interesting things with the content.

Source

http://www.zeapartners.org/articles/200301/czolinkbase