Xml Validation error when inserting paragraphs in multi-line text box controls in Word 2007

Recently whilst developing a VSTO document template, I was presented with this error whilst attempting to re-open a saved document.

‘The Office Open XML File .docx cannot be opened because there are problems with the contents.

Unspecified error. Location: Part: /word/document.xml, Line: 2, Column: 1054’

Word offered to recover the unreadable content; but subsequently saving and re-opening resulted in the same error. I unzipped the docx file and opened the contents into an editor to discover what was breaking it…

Word 2007 allows developers to use textbox controls to position/format specific text regions within a document template. The control provides a handy way of accessing this region from code behind.

The textbox control also provides multi-line capabilities; but in the form of new lines rather than ‘Paragraph’ breaks. You can see this by turning on symbols within word:

TextBoxControl

Notice, within the text box control, pressing return inserts a newline rather than a new paragraph. Outside the control it inserts a new paragraph.

However, this protection provided by the Word UI is not enforced when programmatically setting the text. It allows the following to happen:

image

This has no noticeable effect whilst in the document. However, once you save and re-open you’ll have stumbled onto a nasty bug with no obvious way of back-tracking to the cause. The representation of the paragraph in xml actually acts as the closing tags for the textbox xml block, causing the above error.

To avoid the error, I simply changed the text string; separating newlines with ‘\v’ [rather than StringBuilder.AppendLine()]. If I’d been unable to do that then I would have investigated the word API for a method to format the text.

To fix the broken document you can either

  1. Manually fix the broken xml file
  2. Allow Word to recover the document, remove the paragraph spacing and resave the document

This bug appears to be fixed in MS Word 2010.