Xml Validation error when inserting paragraphs in multi-line text box controls in Word 2007
Recently whilst developing a VSTO document template, I was presented with this error whilst attempting to re-open a saved document.
‘The Office Open XML File
Unspecified error. Location: Part: /word/document.xml, Line: 2, Column: 1054’
Word offered to recover the unreadable content; but subsequently saving and re-opening resulted in the same error. I unzipped the docx file and opened the contents into an editor to discover what was breaking it…
Word 2007 allows developers to use textbox controls to position/format specific text regions within a document template. The control provides a handy way of accessing this region from code behind.
The textbox control also provides multi-line capabilities; but in the form of new lines rather than ‘Paragraph’ breaks. You can see this by turning on symbols within word:
Notice, within the text box control, pressing return inserts a newline rather than a new paragraph. Outside the control it inserts a new paragraph.
However, this protection provided by the Word UI is not enforced when programmatically setting the text. It allows the following to happen:
This has no noticeable effect whilst in the document. However, once you save and re-open you’ll have stumbled onto a nasty bug with no obvious way of back-tracking to the cause. The representation of the paragraph in xml actually acts as the closing tags for the textbox xml block, causing the above error.
To avoid the error, I simply changed the text string; separating newlines with ‘\v’ [rather than StringBuilder.AppendLine()]. If I’d been unable to do that then I would have investigated the word API for a method to format the text.
To fix the broken document you can either
- Manually fix the broken xml file
- Allow Word to recover the document, remove the paragraph spacing and resave the document
This bug appears to be fixed in MS Word 2010.