Bookmap Proposal
We'd like to request feedback on the following proposal for DITA.
Problem
The DITA community has pointed out that the need for a native DITA mechanism for producing books from DITA topics. In particular, we need
- To assemble topics into a sequence, assigning a heading level to each topic.
- To specify the details about the book such as the book title, copyrights, and so on.
- To author any listings (for instance, glossary) that cannot or might not be generated from the content of the topics.
The book model must rigorously separate all information about the layout and presentation of the book from the actual content so the content can be reused in other contexts and so multiple layout styles can be applied to the same book.
The book model should provide enough information so that an XSLT transform could generate a good XSL-FO representation of a book. An XSL-FO processor such as FOP could then convert the generated XSL-FO to PDF.
Summary of proposal
This note proposes new DITA specialization packages with the following components:
Bookmap. A specialization of map to control the aggregation of topics. In much the same way that map currently organizes topics for an online deliverable, the bookmap would define the hierarchical structure for topics in a book.
Bookinfo topic. A specialization of topic to provide the details about the book.
Booklist topics. Specializations of topic to provide listings that cannot or might not be generated from the book information alone.
Bookstyle. Defines a set of policies for book layout and presentation that can be applied to many bookmaps.
Bookmap map
The bookmap introduces specializations of the topicref element to assign a book role to the referenced topic. Most of these roles are those of the primary divisions in the book content, such as dedication, preface, chapter, part, appendix, notices, and colophon. For instance, a preface element within the bookmap indicates that the referenced topic fullfills the role of a preface division within the book.
Within a division, the first level of contained topicref elements acts as the level one headings, the second level acts as the level two headings, and so on.
In the following example, topic1.xml acts as the preface and contains topic1a.xml as a level 1 head:
<bookmap id="bookmapSamp2">
...
<preface href="topic1.xml">
<topicref href="topic1a.xml"/>
</preface>
...
Instead of referring to an external topic, a division can contain an inline title and short description. In this case, the book processor would typically generate a virtual topic by emitting the short description followed by a boilerplate lead-in sentence and a generated listing of the titles and short descriptions of the top-level headings within the division. Such inline divisions can act as segues or glue when assembling topics with more significant content.
The following chapter element could refer to an external topic but, instead, provides an inline division.
<bookmap id="bookmapSamp2">
...
<chapter>
<divinfo>
<divtitle>Childhood</divtitle>
<shortdesc>It all began when I was a child. I discovered gravity.
</shortdesc>
</divinfo>
<topicref href="topic2a.xml"/>
<topicref href="topic2b.xml"/>
</chapter>
...
Here is a complete example of a possible bookmap:
<bookmap id="bookmapSamp2">
<bkinfo href="bkinfoSamp2.xml"/>
<preface href="topic1.xml">
<topicref href="topic1a.xml"/>
</preface>
<chapter>
<divinfo>
<divtitle>Childhood</divtitle>
<shortdesc>It all began when I was a child. I discovered
gravity. </shortdesc>
</divinfo>
<topicref href="topic2a.xml"/>
<topicref href="topic2b.xml"/>
</chapter>
<part>
<divinfo>
<divtitle>College</divtitle>
<shortdesc>They had things to tell me. But would I listen?</shortdesc>
</divinfo>
<chapter href="topic3a.xml"/>
<chapter href="topic3b.xml"/>
</part>
<part href="topic4.xml">
<appendix href="topic4a.xml">
<topicref href="topic4a1.xml"/>
<topicref href="topic4a2.xml"/>
</appendix>
<appendix>
<divinfo>
<divtitle>Beatles</divtitle>
<shortdesc>I have organized my audio collection according to
Linnean principles.</shortdesc>
</divinfo>
<topicref href="topic4b1.xml"/>
<topicref href="topic4b2.xml"/>
</appendix>
</part>
<booklists>
<bibliolist href="bibliolist.xml"/>
<glossarylist href="glossarylist.xml"/>
</booklists>
</bookmap>
In practice, some external top-level division topics will be written for a single book instead of reused widely, but the markup doesn't impose that constraint. Authors can reuse external top-level division topics in different books or even online navigation hierarchies whenever that makes sense.
The complete list of proposed roles consists of draftintro, abstract, dedication, preface, chapter, part. appendix, notices, amendments, and colophon as well as the book lists described below.
Finally, note that the bookmap markup could itself be extended through standard DITA specialization.
Bookinfo topic
A book has a number of attributes, some providing text for the book output and others providing metadata for managing the book.
For simple, basic cases, the author might specify this book information inline within the bookmap (much the way segue topics could be specified inline within the bookmap):
<bookmap id="bookmapSamp1">
<bkbasicinfo>
<booktitle>Wonders of the Natural World</booktitle>
<booksubtitle>From the microscope to the telescope</booksubtitle>
<bookabstract>A brief summary of all knowledge with special
attention to string theory and the big bang.</bookabstract>
<author>George Cramden</author>
<publisher>George Cramden Worldwide Industries, Inc</publisher>
<copyright type="primary">
<copyryear year="2004-02-04"/>
<copyrholder>George Cramden Worldwide Industries, Inc</copyrholder>
</copyright>
<copyright type="secondary">
<copyryear year="2004-02-04"/>
<copyrholder>Loretta Cramden Amalgamated Industries, Inc</copyrholder>
</copyright>
<critdates>
<created date="1998-04-12"/>
<revised golive="2004-02-05"/>
</critdates>
<permissions view="all"/>
</bkbasicinfo>
<preface href="topic1.xml">
...
For more detailed information, the author would specify this book information in an external topic (that is, a file outside the bookmap):
<bookmap id="bookmapSamp2"> <bkinfo href="bkinfoSamp2.xml"/> ...
Maintaining the information in an external topic has some significant benefits:
For document type designers, the book information model and the book map model can evolve and be specialized separately.
For authors, the book organization can continue to change after the book information has been edited and frozen.
The bookinfo topic supplements the metdata elements of the prolog (which are available in either the inline or external book information) with additional descriptive information about the book. The bookinfo topic might cover the following categories:
Book identifier: The part number, edition, ISBN number, and so on.
Book publisher: The person or organization as well as the publication location.
Book rights: Copyrights as well as restrictions or entitlements.
Book history: The publications cycle including authored, reviewed, edited, tested, approved, published, and miscellaneous events.
Book cover: Other information supplied for the covers, title page, credits page, and so on.
The following example illustrates how a bookinfo topic might look, though the specifics are subject to revision. The example shows only a small subset of the book attributes that would need to be captured by the bookinfo topic:
<bkinfo id="bkinfoSamp2">
<title>Wonders of the Natural World</title>
<bkabstract>A brief summary of all knowledge with special
attention to string theory and the big bang.</bkabstract>
<bkinfobody>
<bkid>
<bkpartno>1234</bkpartno>
</bkid>
<bkpublisher>
<organization id="publisher_org">
<orgname>Ralph Cramden Worldwide Industries, Inc</orgname>
<address><city>Weed</city>, <stateprov>CA</stateprov> <postalcode>96094</postalcode>
<country>USA</country>
</address>
</organization>
</bkpublisher>
<bkrights>
<bkcopyrfirst>
<year>2004</year>
</bkcopyrfirst>
<bkowner>
<organization conref="publisher_org"/>
</bkowner>
</bkrights>
<bkhistory>
<bkpublished>
<person>
<firstname>Ralph</firstname>
<lastname>Cramden</lastname>
</person>
<completed>
<month>2</month>
<day>4</day>
<year>2004</year>
</completed>
<summary>The book released to a grateful world.</summary>
</bkpublished>
</bkhistory>
</bkinfobody>
</bkinfo>
The bookinfo element of DocBook provides a good example of the kind of information that might be captured by the bookinfo topic.
Booklist topics
For lists such as the table of contents, list of figures, and list of tables, the book processor would always generate the listing by scanning the content topics. For such lists, the bookmap wouldn't need to provide a representation.
Other lists such as a glossary or bibliography are often authored. Such booklists would be stored in topics outside the bookmap. Different types of booklists would require different topic specializations. In some cases, the specialized topic might contain a list or simpletable structure. In others (such as the glossary possibilities discussed previously in this forum), the specialized topic might be a container topic with a list of specialized child topics. In either approach, the topic specialization would provide semantically precise markup for the content of the booklist rather than markup for the presentation of the booklist as a division within the book.
As with bookinfo, a specialized topicref would be restricted to refer to the specialized booklist topic. For instance, if a glossaryref referred to a bibliography topic, the bookmap processor would report an error. A DITA-sensitive editor would prevent the user from creating such a reference.
The complete list of proposed booklists consists of abbreviations, trademarks, bibliography, glossary, and index. The book processor should also be able to generate the index from the content instead of processing an authored index.
Eventually, a process could select the subset of items from the booklist that occur in the book content. For example, an organization might set up a common list of index or glossary definitions. A process would harvest the index and glossary occurrences from the topics used in the book, collate the items, distill a unique set of items, select the definitions for these items, and finally format the definitions in the book output.
In addition, the href for a booklist could easily invoke a query engine with parameters specified on the URL. The query engine could return an XML file conforming to the specialized booklist topic, which could be processed exactly as if the booklist had been authored as a file.
Bookstyle
The component covered thus far provide only content and data for the book:
- The topics provides the content for the body of the book
- The bookmap arranges the topics in a sequence with heading levels
- The bookinfo provides data about the book
- The book listings provide listings for interpolation into the book
A process that produces a book also requires a definition of the style policies to apply to the content and data. That is, a process might take input from both bookmap and bookstyle files.
The same book can be generated with different styles. In addition, a consistent style definition can be applied to many books, ensuring consistency for a library or even an organization. That is, the bookstyle file is intended to define a common book format instead of styling a specific book. A style designer would set up the policy file, which could then be used by any number of authors for any number of books without burdening the authors with conforming to a presentation guide.
Here's an illustration of the kinds of policies that might be set in a bookstyle file. Even more so that with the bookinfo example given earlier, the bookstyle example is an illustration rather than a detailed proposal:
<bookstyle>
<!-- Specifies basic book characteristics -->
<basic>
<style-type><final/></style-type>
<page format="8.5x11"/>
<pagination><folio/></pagination>
<cover-layout><spine/></cover-layout>
<column-layout><offset/></column-layout>
<generate-title><preface/></generate-title>
<generate-prefix><numbered-label/><appendix/><chapter/><part/></generate-prefix>
</basic>
<!-- Controls what's provided and in what order -->
<content-sequence>
<titlepage/>
<creditspage/>
<toc depth="3"/>
<dedication/>
<preface/>
<chapters/>
<appendixes/>
<notices/>
<bibliography/>
<glossary/>
<index/>
<colophon/>
</content-sequence>
<!-- Controls specific formatting of content types -->
<!-- Target type can be a topicref or topic element, the ancestor type, -->
<!-- or the output class for the topic (or topicref ideally) -->
<!-- Relative XPath expression is rewritten to use class or outputclass -->
<!-- attributes -->
<styled-target type="preface">
<page-break><odd/></page-break>
</styled-target>
<styled-target type="appendix | chapter | part/chapter">
<page-break><odd/></page-break>
<subtoc><show/></subtoc>
</styled-target>
</bookstyle>
Note that, in particular, the sequence of book components might be controlled by the bookstyle policies.
One approach would be to define a basic bookstyle file that represents the lowest common denominator and to extend the basic bookstyle markup to control the special capabilities of a chosen tool. That approach could maximize reuse of common processing.
Libraries and Articles
Eventually, bookmap might be supplemented to support a multi-volume library. A separate libmap specialization of the base DITA map could refer to the bookmaps that compose the library.
Here is an example of a library:
<libmap id="libmapSamp">
<libinfo href="libinfo"/>
<book mapref="bookmap1.xml"/>
<book mapref="bookmap2.xml"/>
...
<book mapref="bookmapN.xml"/>
<liblists>
<glossarylist href="glossarylist.xml"/>
</liblists>
</libmap>
Similarly, at the low end of the scale, a separate articlemap specialization of the base DITA map could wrap the topics that compose a article.
Here is an example of an article:
<articlemap id="articlemapSamp">
<articleinfo>
<articletitle>Natural Wonders</articletitle>
<articleabstract>Some people are afraid to look down. This article explains that, on one occasion,
the author looked down and discovered shoes.</articleabstract>
</articleinfo>
<topicref href="topic1.xml">
<topicref href="topic1a.xml"/>
</topicref>
<topicref href="topic2.xml">
<topicref href="topic2a.xml"/>
<topicref href="topic2b.xml"/>
</topicref>
...
<topicref href="topicN.xml"/>
<articlelists>
<bibliolist href="bibliolist.xml"/>
</articlelists>
</libmap>
Any of the top-level topicref branches that identify the article content could be pulled into a book by conref, preserving the organization of the hierarchy fragment.
Feedback
To supplement this discussion, the attached package includes some sample files to illustrate the approach:
- dtd - contains prototype DTD modules
- sample - contains samples for the prototype DTD modules
Note that the proposal is only a sketch and that many of the details will have to be worked out. Still, I hope it gives enough detail that you feel able to comment.
If you have thoughts on how to produce books from DITA, please respond.
Dita Wiki