Semantic annotation in collaborative content systems (like wiki)

Collaborative content systems accumulated in short time a huge amount of good quality well lectured data, being in some cases a real alternative to traditional lexicons. The natural next step is the wish to exploit this human language data programmatically, allow the "machine" to understand the data, make it machine accessible. Although some information can be extracted with the general approaches of usual information extraction from web data, enriching the texts with some kind of semantic annotation would certainly lead to a new level of utility. This annotation shall be done manually by the editors. One important constraint is thus surely ease of use (simple editing, simple syntax), which is arguably one important factor for the sucess of wiki systems and in general an seamless integration in the existing system should be envisaged, so that the user isn't required to learn to much new or readapt to a change environment. This structured information, which will be inputted by the user in some simple syntax, has to be extractible and convertible to standard formats, particularly RDF (Resource Description Framework). Third step is to provide this information as a data service.
1 answer

Semantic MediaWiki

Multiple solutions were proposed how to extend Wiki systems with the ability to manage semantic annotation.
One which managed it to production state is the Semantic MediaWiki

One dedicated wiki-system employing SMW being

SMW is a free extension to the widely used (standard) MediaWiki system. It strikes by it's simplicity.
The basic concepts of semantic modelling (RDF) are mapped to elements of the wiki system as follows:

  • Page = Resource (or Subject in a PL statement)
  • Named Link = Property (or Predicate)
  • Linked Page = Resource (or Object)

Together forming one RDF-Statement.
The only real (and very intuitive) addition (change in syntax) being the Named Link allowing to describe the kind of relation between the subject and the object, the predicate name being constrained by a controlled vocabulary.
Thus there is no need, for new input-elements or significant changes in the editing syntax.
One drawback is the fact, that on one page only statements with the topic of the Page being the Subject can be constructed. But eliminating this constraint, while probably not much too complicated on the editing side, would impose a serious complexity problems both for operation on the data and human understanding of the data structure, thus leaving us with this trade-off between expressivity and complexity.