The option is
-v already has a meaning in traditional grep, to a way of specify that only unmatching lines should be output. But XML does not have the concept of lines, and the hierarchical nature of it means that the concept of non-matching nodes is probably useless. A search for elements not matching “/thing/test” would logically return the whole document (the root node), the element “thing”, and any children of “test”, but not “test” itself. It’s hard to see how that would be useful, especially when the entire contents of “test” will be included as children of the root regardless.
--value option takes a single XPath argument, which is evaluated relative to the matching node. The option can be repeated. For each matching node, a line is output with the results of the value XPaths separated by commas.
-v options are specified, the matching node is output just as in the original program.
As an example, here are my published XML-related posts and their approximate word counts:
xmlgrep.py -h "//item[post_type = 'post' and status = 'publish' and count(category[@nicename = 'xml']) > 0]" ejrh.wordpress.2012-05-04.xml -v "link/text()" -v "title/text()" -v "string-length(normalize-space(encoded)) - string-length(translate(normalize-space(encoded), ' ', ''))" https://ejrh.wordpress.com/2011/05/10/xml-grep/, XML grep, 651.0 https://ejrh.wordpress.com/2012/02/27/xml-viewing-and-diffing/, XML viewing and diffing, 2077.0 https://ejrh.wordpress.com/2012/04/23/xml-in-the-database/, XML in the database, 1079.0
The query is a bit messy, but that’s XPath’s fault more than it is xmlgrep.py’s. ;-) The formula for approximate word count assumes words are separated by spaces, and works by:
- Normalising the white space in the text, replacing all blocks of whitespace with a single space.
- Translating all space characters into empty strings.
- Comparing the difference in text length before and after the previous step, i.e. counting how many blocks of white space were removed.
Although it’s a mess, it’s pretty neat that you can do that with something as limited as the XPath string functions.
The program xmlgrep.py is a standalone file hosted on Google Code. The latest version is at http://code.google.com/p/ejrh/source/browse/trunk/utils/xmlgrep.py.