Introduction
By
D we denote some collection of documents over some finite alphabet
A. Consider a single property, which may have multiple values (i.e. some property that is NOT of type
owl:FunctionalProperty, [
OWLReference04])
We consider three categories of objects - a potentially infinite domain of documents or knowledge items, a finite and extensible list of their properties, and for each property - its possible values, which are called "tags", which also come from a finite and extensible list (i.e. tag vocabulary for each property is controlled). Given a property, there is a bipartite graph relationship (many-to-many) between documents and possible tags. A typical search problem would be - given a sublist of properties and values for tags, find a document. A typical annotation problem is dual one - given a document and a sublist of properties, find some appropriate tags for it.
Holding a middle ground
Šajā nodaļā jāpaskaidro, ka mācību materiāli un . Parasti tas nozīmē to, ka jāievēro saprātīgs vidusceļš, neder daudzi līdz šim izpētīti robežgadījumi vai ekstrēmi pieņēmumi.
Some border cases have been well studied and incorporated into existing applications:
- Very large amount of indexed documents (Google, Del.icio.us) vs. a small amount of indexed documents (manual annotations, ontology reasoning for certain subject areas). We are interested in a number of documents, which is large (10-100 thousand), which cannot be processed with full text search and something like Google rank alone, but cannot be quickly and consistently annotated by a single team either.
- Very many information servers (global Web with documents and their keywords) or just one server to store annotations. We are looking into some servers (e.g. each is for one educational establishment) and model their collaboration.
- Almost no collaboration between the content developers (beyond what is guaranteed by common Web protocols and standards) vs. content developers complying with a certain ontology (e.g. developed for some ambitious project like IMS).
- Very many possible values for tags (many millions and more of possible keywords and phrases as in full text search), very few values for tags (e.g. naive Bayes, which can classify resources as either "spam" or "non-spam"). We are interested in the number of tags, which come from a controlled vocabulary, where there is some difficulty barrier to add new things, but it is not made impossible by imposing too drastic user rights management.
- Tags, which are truly global and same for anyone (using a fixed taxonomy) vs. tags which are user specific (like Del.icio.us). We are interested in partly overlapping tag spaces, which could at some point become mature enough to be merged between several institutions. On the other hand, institutions and even their branch offices and teams should have some autonomy w.r.t. properties and their value ranges.
- Very few properties (author, date, generic tags) as in Del.icio.us or Flickr.com vs. very many or even unlimited number of properties (as in full-fledged Semantic Web application, where ). We want a certain (large, but limited number of properties as is appropriate for e.g. a facetted browse interface).
- No intelligence at all (simple operations on datatypes as in an SQL query) vs. Descriptive Logic reasoning
- All items being tagged in a centralized location (like books being categorized by a national library and then replicated all over the country) vs. complete absence of central hierarchy (semantic wikis, full text search). We want a flat user space, which can make meaningful contribution to those properties, they are most interested into (and assumingly - most qualified to use).