Status: gathering input (please provide)

Background

When it comes to defining, using, evolving, reusing vocabularies and constraints there exists a wide range of interpretations that are left to the industry adopters to define what works best for their needs. Various communities have published such guidelines based on the knowledge from the contributing members which reflects the needs focused on their specific level of experience and industry segment. Such communities as OSLC, various W3C groups, OASIS TCs have published such guidelines which are available in the References section. Those publications represent a point in time view, which may or may not be valid for the OSLC Core TC.

Purpose

The purpose of this page is to better understand the needs for development and evolution of vocabularies and usages of them. Gathering a common set of use cases and requirements will help determine what amount of new material or existing that can be leveraged. The term "vocabulary" is in this context is intended not to be too limiting but to even allow expansion into areas such as understanding which direction a property should be defined with, using redundant properties (even for inverse direction).

Organization of this document follows the guidance set out in OSLC Core-TC Editor's Handbook.

Use Cases

The following use cases are split into common categories of use cases. TC members are encouraged to add their own use case, using their own judgement to the level of detail needed, and mark it as appropriate. If there has been some time since these have been actively discussed, it would be recommended to notify the TC by sending what has changed to the TC mailing list.

Use cases tags:

3.0 - proposed 3.0 content

backlog - proposed backlog content

TODO - outstanding action or question

Defining New

As new specifications are developed, there are needs to define new vocabulary terms and resource shapes that express them in both machine and human readable forms.

Details/Requirements:

  1. Consistent naming conventions and common data types to use to assist with comprehension and consistent reuse 3.0

  2. Need for clients to be able to find the authoritative document(s) governing a vocabulary. 3.0

  3. As specifications evolve, new versions are made, need to have the existing terms work compatibly with current and newly terms. 3.0

  4. Clients want to know when the service has upgraded and what version, maybe not about vocabulary.
    • TODO: Move to Discovery
  5. Are RDFS assertions like domain/range things to encourage or discourage?
    • Need guidance on range so we don't limit reuse. Need guidance on domain so it doesn't infer the wrong thing. TODO: Consider removing this as just follows best-practices (or common usage) of these terms. Has problems with inferencing.
  6. Need to use vocabularies already defined but not expressed as RDF e.g. those in XML namespaces. 3.0

  7. Vocabulary definitions need to be programmatically consumable, preferably RDFS/OWL expressed in Turtle 3.0

  8. Need to publish higher-level shapes for resource types to be shared with other specs and applications 3.0

Updating and evolution

As new specifications are created or existing ones updated, new vocabulary terms are created or modified in response to new needs or deprecating items.

Details/Requirements:

  1. Evolution of individual vocabulary terms: draft -> standard -> deprecated 3.0

    • The primary case is how do take a new vocabulary term, such as a new property, introduce it into an existing namespace (say http://open-services.net/ns/cm#) while not invalidating other existing terms. Secondary case is how to take a standardized term and mark it as deprecated.

  2. Evolve resource shapes for OSLC defined resources per draft or standard, evolving revisions and versions. 3.0

  3. Are vocabulary documents part of what OSLC would call domain specifications, or are they separate. If they are separate, what is the relationship and lifecycle of each? What approvals are required to update a vocabulary, e.g. to add a new term?
  4. Transition from proprietary vocabulary to standards-based backlog

    1. (jc) Various versions from different (standards) bodies
      • Not every organization has created vocabularies for the domains, standards and terms they have defined. As a result, implementations that want and need to use linked data vocabularies for these standards will have created vocabularies for themselves. For example, the OMG owns and defines the Unified Modeling Language (UML), which has already undergone several versions, and has related specifications and standards (eg. SysML). The Eclipse project has created an implementation of the UML (UML2), and, depending on the version uses different namespaces in its file based resources. (eg. http://www.eclipse.org/uml2/3.0.0/UML or http://www.eclipse.org/uml2/4.0.0/UML ). IBM alone has two (or more) different UML vocabularies they use internally for UML models managed in OSLC based servers depending upon the originating tool (RSA or Rhapsody). Because of the history of these two tools, and their incomplete overlapping of the supported UML specification, it is not a simple matter of picking one vocabulary and using it. A clear idea or path of evolution is necessary first. Ideally the standard or specification owner would be the best authority to define (and maintain) a vocabulary for a standard they define and own. The OMG is now considering the definition of a formal RDF vocabulary (and LPD style shapes) for SysML with a long term expectation for doing so for UML in general. With the OMG about to define and own a formal vocabulary for UML (a specification they own) The focus on UML is just a single real life example of this issue. There are similar examples on the horizon for other domains/specifications (SQL, BPMN, ...). It would be nice to know what the expectations are on service providers and clients for handling the evolution of vocabularies in shared domains. How are implementations and clients expected to migrate and adopt a new vocabulary for old data. Do we just not support this, and create big bang migration events for the entire system when a new vocabulary is adopted. Do we only support the original vocabulary at the time of the resource's creation (and presumably all the related and linked resources). Do we combine vocabularies for a period of time that both vocabularies are supported) in the same resource (doubling the size of the resource and potentially creating conflicts in integrity)?

    2. Customer defines own vocabulary, them moves to using standard-based one Customer will want to use their own existing vocabulary with terms like acme:validates. Sometime later the OSLC or some other organization defines the term themselves, in a more open way like: oslc:validates. Some questions one might expect to come of this would be:
      • Do we expect all service providers to instantly change their resource properties from acme:validates to oslc:validates?
      • Do we expect service providers to support both properties, or perform on the fly transformations/conversions?
      • Do we require shapes (MAY -> MUST) so that clients know which properties are supported?

Reuse

When solving a problem with new specification, the ability to reuse vocabularies helps with data correlation between applications. Also vocabularies and shapes should be developed to ensure they can easily be reused themselves, by domain specs or general implementations.

Details/Requirements

  1. (img) DRY (don't repeat yourself or someone else). This supports reuse and evolvability and help reduce dependencies on other specifications. 3.0

  2. One of the best practices is to reuse, not invent new, various vocabulary terms. There are a number of factors to consider when deciding if a vocabulary is stable.

    1. Does the URL have to be URL addressable (ie XML datatypes vs Dublin Core terms) to be considered stable? 3.0

    2. What are the acceptable licenses for reuse? 3.0

    3. Need understanding how stable the vocabulary is, what is the governance for maintaining it? 3.0

    4. Is 'vocabulary' a sensible/the right granularity, or is it really 'vocabulary term'? If a single vocabulary document mixes mature, immature, and deprecated terms, it's not obvious that 'vocabulary' is anything more than a coarse-grained 'set of terms' aggregation. 3.0

  3. Need to extend the capability of a resource shape without having to restate the origin shape backlog

Linking

(these are bit more like plain old requirements)

  1. All links should be modeled as simple links, as a single triple (subject resource, link property, target resource) 3.0

  2. Links should only have one direction, there should not be an equivalent inverse of the link property 3.0

  3. To prevent data duplication, there should only be one authority for a link. 3.0

Miscellaneous

  1. (img) It should be feasible/reasonable to implement and deploy a vocabulary, like requiring inferencing.

To keep clients simple, need to have require minimal software requirements for processing basic uses cases: non-inferencing 3.0

  1. Need to query for same vocabulary terms used across multiple domains. By limiting the range (object), it improves query by giving it detail but limits its reuse (likewise with domain)

Security

  1. (img) In addition to http-level security (eg) property-level read access control. backlog

Requirements

3.0

  1. Ensure vocabularies are always URL addressable (GETable).
  2. Naming conventions for vocabulary terms
  3. Vocabulary publication rules: uri assignment, template and location
  4. Links should be modeled as simple links, as a single triple
  5. ...TODO: complete the list of reqs

backlog

  1. Enable use in "off grid" environments in which eg oasis, w3c and dublin core vocabularies are not reachable.

References

Change History

VocabUseCases (last edited 2014-03-20 13:00:26 by sspeiche)