How to Validate an ODF Document
There are four steps to validating an ODF document:
- Extacting the XML from the ODF contain file
- Determining what version of ODF your document uses
- Retrieving the schemas associated with that version of ODF
- Executing the validation tool
We take each one of these topics in turn.
Extracting the XML
The first thing to note is that the typical ODF file, with an odt, ods or odp extension is not a pure XML file. It is a container file, in ZIP format, containing several XML files, along with associated binary images and other resources. So first you need to extract the XML from the container file. The method to do this will vary according to your operating system and tools, but a typical way is to rename the file to have a .zip extension and then to unzip it using your default zip utility. In most cases you will end up with the following XML files:
- content.xml
- styles.xml
- meta.xml
- settings.xml
- META-INF/manifest.xml
Checking the Version Attribute
Next you need to determine what version of ODF the document uses. This can be found by inspecting the office:version attribute in the root element any of the XML files. Expected values are "1.0" or "1.1".
A quick survey of common ODF authoring applications indicates the following defaults:
OpenOffice.org 2.4.0 writes ODF 1.1
- IBM Lotus Symphony beta 4 writes ODF 1.1
- Microsoft's Open XML / ODF Translator Add-ins for Office write ODF 1.0
- Google Docs writes ODF 1.0
Retrieving the Schemas
You can retrieve the schemas from theODF TC's homepage. You will see 3 schema files listed for each published ODF version:
- The manifest schema, for validating the manifest.xml file
- The ODF schema, for validating the other XML files
- A "strict" version ODF schema, as described in Appendix A of the ODF standard
For most uses you will want to download the manifest schema and the ODF schema.
Running the Validation
This step will vary according to what validation tool you are using. We'll give a few examples using some common validation tools. We'll also explain some known bugs and workarounds.
Jing
Jing is a Relax NG validator written by James Clark, co-author of the Relax NG standard. If you download it and install according to instructions on his web site, you can validate an ODF XML file with a command line like this:
java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml
Note in particular the use of the "-i" command line flag for jing. This is necessary in order to disable the ID/IDREF checking from the Relax NG DTD Compatibility specification that jing enforces by default (See the "ODF Validation for Dummies" page, which explains why this is needed and why it's okay from a specification viewpoint).
Frequently Asked Questions
Question: I get this error message when trying to validate all ODF documents what is wrong? "conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
Answer: This error is seen when validating with jing, when failing to use the "-i" flag to disable the Relax NG DTD Compatibility checking. Try again, with the -i command line flag.
- Question: When I validate my ODF document, I get many errors complaining about an "undeclared soft-page-break element". What is wrong?
Answer: It sounds like you have an ODF 1.1 document, but are trying to validate it against the ODF 1.0 schema. Check the office:version attribute on the document's root element. If it says "1.1" then you need to be using the ODF 1.1 schema.
References
How to Validate an OASIS OpenDocument file (KOffice wiki)
Office Wiki