Friday, May 11, 2007

Comparing XML in a JUnit test

Today I tried to compare 2 XML documents in a JUnit test. One was created with Altova's MapForce, the other was the result of a new XmlBeans document (BTW, both are nice products). Notice that these XML documents use a slightly different notation for the main namespace:

Document one:

<?xml version="1.0" encoding="UTF-8"?> <Message xmlns="http://www.a.nl/a10.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="C:/longpath/a10.xsd"> <MessageHeader>....

Document two:

<?xml version="1.0" encoding="UTF-8"?> <a:Message xmlns:a="http://www.a.nl/a10.xsd"> <a:MessageHeader>....

Here is what I tried:

1. org.w3c.dom.Document.equals. Well, that goes nowhere.

2. org.dom4j.Document.equals. Same.

3. XMLUtil's XMLAssert.assertXMLEqual. Bummer, works alright, but it says that Message and a:Message are different and they are not (they're in the same namespace!).

4. Juxy's XMLComparator.assertXMLEquals. No go, same result.

5. I took a short look at the site of XSLTunit. It says that XSLTunit is a proof of concept. Furthermore, this one is also targetted at XSLT testing. So I decided to skip it.

6. Reading a bit closer I noticed that XMLUtil 1.0 released in April 2003 (wow, that's old), has a followup: XMLUtil 1.1beta1 released in April 2007 (wow, that's new). The website says they fixed the namespace thing! Unfortunately they didn't (yet, I hope).

7. The final solution: with some String.replaces, I just removed the namespace stuff and the schema location from the documents. XMLUtil 1.0 now works nicely with very good diff messages.

Update 2007-05-24 I was quite wrong. XmlUnit does notice the differences in namespace usage (and puts a message in the exception), but it does not fail until it sees a real difference. The real difference turned out to be whitespace. By adding the code added below, the differences disappear.

XMLUnit.setControlParser("org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"); XMLUnit.setTestParser("org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"); XMLUnit.setSAXParserFactory("org.apache.xerces.jaxp.SAXParserFactoryImpl"); XMLUnit.setTransformerFactory("org.apache.xalan.processor.TransformerFactoryImpl"); XMLUnit.setIgnoreWhitespace(true);

8 comments:

  1. I once used this xsl:
    http://www.tei-c.org.uk/wiki/index.php/Remove-Namespaces.xsl

    to get rid of the namespaces. I think it would result in slightly neater code.

    ReplyDelete
  2. XMLUnit 1.1 fixed namespace support in XPath queries, but didn't change the comparison logic.

    I would have expected that you'd get a NAMESPACE_PREFIX_ID difference (see user's guide) which should be recoverable so that assertXMLEquals would pass while assertXMLIdentical would fail. What kind of Difference do you get instead?

    If you open a bug report with XMLUnit, I'll take care of it before the XMLUnit 1.1 final, promised 8-)

    ReplyDelete
  3. @Stefan, very nice of you to drop by!

    I did try to write a diffengine for another problem (comparing dates in a different format), but I was already spending way too much time on the problem. So I dropped it all and went for the String substitutions.

    Your javadoc is quite good, but I did not look there as I did not know where to look.

    I used assertXmlEquals, but it failed on the prefix. It said something like 'element n:E does not match element E'. Do you need a precise error message?

    ReplyDelete
  4. Hi Erik,

    I turned your example into a Unit Test and it turns out that the schemaLocation attribute is the problem. If you remove that, XMLUnit will call the two pieces similar.

    XMLUnit is still deeply rooted in DTD land in many places. In this case I consider adding a new type of recoverable difference for schemaLocation specifically (just like there is one for the SYSTEM id of a DOCTYPE declaration already) which would make your documents similar.

    ReplyDelete
  5. Stefan, perhaps your xmlunit is already a bit further then the beta release? If I remove the schema locations, I get the same error as reported above.

    BTW is there a more suitable place to keep this discussion going? An e-mail list?

    ReplyDelete
  6. XMLUnit's mailing list is at sourceforge, http://sourceforge.net/mail/?group_id=23187 holds the subscription link (it's the general list).

    I don't think svn trunk (which I'm working from) has any changes that apply to your case, maybe my test is too simple since I only had the very first elements of your XML files.

    ReplyDelete
  7. Even though this post is very old, it helped me a lot.
    Looks like XMLUnit 2.x is very alive and Kicking.

    Regards,
    Dan

    ReplyDelete
  8. Even though this post is very old, it helped me a lot.
    Looks like XMLUnit 2.x is very alive and Kicking.

    Regards,
    Dan

    ReplyDelete