Forwarded by Don Day/Austin/IBM on 08/01/2006 09:28 AM
Don Day/Austin/IBM 08/01/2006 08:06 AM
- To DITA TC list cc Subject Fw: indexterm issues
All,
For our main discussion at today's meeting, I'm forwarding a summary of Paul's observations and proposals that he sent at my request. The first part of his note basically repeats what he sent to the list just previously.
1. His interpretation of the nested indexterm processing seems to match my expectation as well.
2. His alternate index-range-start proposal would change the currently-approved model, but again, this is a last chance to agree on a design we think will be robust. I happen to think that Paul makes several compelling suggestions here, so I would like to know what you think about this proposal, which features clearly unnested start and end markers as well as an unambigous ID-like link between the start and end (versus translateable text, which invites mismatch).
3.1,.2,.3 Finally, he lists the remaining issues for discussion, most of which could be cleared up by a clear set of decisions on the above issues. I expect anything unanswered after today's meeting will have to be put to the DITA 1.2 timeframe.
So that we do not further delay our goal for getting 1.1 finalized in this calendar year, if necessary, we may have to back out the index range design entirely until we have time to get full agreement.
Regards, -- Don Day Chair, OASIS DITA Technical Committee IBM Lead DITA Architect Email: dond@us.ibm.com 11501 Burnet Rd. MS9033E015, Austin TX 78758 Phone: +1 512-838-8550 T/L: 678-8550
"Where is the wisdom we have lost in knowledge?
- Where is the knowledge we have lost in information?"
- --T.S. Eliot
Forwarded by Don Day/Austin/IBM on 08/01/2006 07:43 AM
"Grosso, Paul" <pgrosso@ptc.com> 07/29/2006 09:12 AM
- To Don Day/Austin/IBM@IBMUS cc Subject indexterm issues
How to interpret nested indexterms
Doing levels of indexterms via nesting is strange, but unfortunately that is already part of DITA 1.0.
But assuming we are stuck with it, we need to determine unambiguously what every combination means.
So what does
<indexterm>cheese
<indexterm>sheeps milk cheeses
<indexterm>pecorino</indexterm>
</indexterm>
</indexterm>
mean? Suppose that is the only indexterm(s) in my document--what should the resulting index look like?
Specifically, do "cheese" and "sheeps milk cheeses" here only serve to indicate that "pecorino" is at level 3? That is, does the above generate a single pointwise index reference to a level three "cheese;sheeps milk cheeses;pecorino" entry, so the resulting index might look like:
cheese
- sheeps milk cheeses
- pecorino 5
Or does the above generate three pointwise index references, one for each level, so that the resulting index might look like:
cheese 5
- sheeps milk cheeses 5
- pecorino 5
I'm assuming the former, and if you want an index reference for just "cheese", you'd have to add
<indexterm>cheese</indexterm>
to your input. I think this is the only sane way to do it, and it matches what would happen in DocBook where you'd do:
<indexterm>
<primary>cheese</primary> <secondary>sheeps milk cheeses</secondary> <tertiary>pecorino</tertiary>
</indexterm>
(which, by the way, is a much clearer markup for indexterms).
index-range-*
The currently proposed index-range-* elements are just empty "flags" that get put inside an indexterm element. But it is not necessarily clear what this means in the case of nested indexterms.
For example, per my best understanding, one way to indicate a page range for my "pecorino" example would be markup such as the following (where the comments just indicate what pages each indexterm falls on):
. . . <!-- page 22 --> <indexterm>cheese
<indexterm>sheeps milk cheeses
<indexterm>pecorino<index-range-start/></indexterm>
</indexterm>
</indexterm> . . . <!-- page 24 --> <indexterm>cheese
<indexterm>sheeps milk cheeses
<indexterm>pecorino<index-range-end/></indexterm>
</indexterm>
</indexterm> . . .
But what if the <index-range-start/> is placed elsewhere in the first indexterm, such as:
<!-- page 22 --> <indexterm>cheese<index-range-start/>
<indexterm>sheeps milk cheeses
<indexterm>pecorino</indexterm>
</indexterm>
</indexterm>
Is that equivalent, does it mean something else, or is it an error? (My best guess is that it should be equivalent.)
What about the following:
<indexterm>cheese<index-range-start/></indexterm> . . . <indexterm>cheese<index-range-end/>
<indexterm>sheeps milk cheeses </indexterm>
</indexterm>
Since the first is an index reference for "cheese" and the second is one for "cheese;sheeps milk cheeses", my best guess is these two do not constitute a matched pair.
What about the following:
<indexterm>cheese<index-range-start/>
<indexterm>sheeps milk cheeses<index-range-end/> </indexterm>
</indexterm> . . . <indexterm>cheese<index-range-end/>
<indexterm>sheeps milk cheeses </indexterm>
</indexterm>
Is the first indexterm a range start or range end (or just an error)? If it is a range start, does it end immediately, or is its range-end ignored, and the range is ended by the subsequent indexterm?
None of this is made clear in the current writeup.
Also, I think this is very confusing and error-prone for users.
Rather than having empty index-range-* elements that magically redefine their parent to have different semantics, I think it would be preferable to have a specialization of indexterm (or just another element) that can be used to indicate the start of a range--so we would write something like:
<index-range-start>cheese
<indexterm>sheeps milk cheeses
<indexterm>pecorino</indexterm>
</indexterm>
</index-range-start>
to start the "cheese--sheeps milk cheeses--pecorino" range.
While in theory we could then have an analagous index-range-end element with the identical nested indexterm content, I think that is another mistake in the current proposal. The idea of creating matching pairs by having to have identical content has already been pointed out as a translation nightmare, but when you start to consider nested indexterms, it's an even worse error-prone mess, both for the user and the implementors.
Instead, I would add an NMTOKEN attribute to both index-range-start and index-range-end, and have index-range-end be an empty element that just refers back to the start:
<index-range-start subject="pecorino">cheese
<indexterm>sheeps milk cheeses
<indexterm>pecorino</indexterm>
</indexterm>
</index-range-start> . . . <index-range-end subject="pecorino"/>
The "subject" attribute would act like a sort of id/idref, but I've avoided really using IDs, because then if you have two ranges that discuss "pecorino", you couldn't reuse the id="pecorino".
Remaining issues
That still leaves us with several questions not addressed by the latest spec:
1. The index-see and index-see-also writeups have errors that prevent me from understanding and reviewing them effectively. I'm expecting that we'll be able to make these elements and their description work.
2. There are issues with how index-sort-as within a map's metadata is supposed to work, especially with nested indexterms; see http://lists.oasis-open.org/archives/dita/200607/msg00076.html
3. We need better wording for index-range-*, and I started with some suggestions, but we need to decide on the final model before it makes sense to go much further. But regardless of the model, we do need better wording about what forms a "properly matched pair" of index-range-start and index-range-end and what to do in the case of errors.
paul
Dita Wiki