XML with XmlStarlet
Introduction
XmlStarlet is a cross-platform,
command-line utility writtent by Mikhail Grushinskiy to read/modifiy XML files.
It requires first learning XPath
(and XSLT since XmlStarlet relies on its logic). Strangely enough, it looks
like it can only read from a local file, not fetch a page from a remote website
("failed to load external entity") — which can be done with eg. wget
or curl.
The main commands are:
- el (or elements) - Display element structure
of XML document
- sel (or select) - Select data or query
XML document(s) (XPATH, etc)
- ed (or edit) - Edit/Update
XML document(s)
- fo (or format) - Format XML document(s)
- c14n (or canonic) - XML canonicalization
- ls (or list) - List
directory as XML
- esc (or escape) - Escape special XML characters
- unesc (or unescape) - Unescape special XML characters
Studying the structure
To see the whole file: xml el table.xml
To include attributes: xml el -a table.xml
To include attributes and their values: xml el -v table.xml
To see just the structure: xml el -u table.xml
Querying
The plain command is: xml sel -t <options>, where <options> can
be:
- -m or --match <xpath> - match XPATH expression
- -c or --copy-of <xpath> - print copy of XPATH expression
- -v or --value-of <xpath> - print value of XPATH expression
- -o or --output <string> - output string literal
- -f or --inp-name -
print input file name (or URL)
- --var <name> <value> --break or
- --var <name>=<value> - declare
a variable (referenced by $name)
- -i or --if <test-xpath> - check condition <xsl:if
test="test-xpath">
- --elif <test-xpath> - check
condition if previous conditions failed
- --else -
check if previous conditions failed
- -e or --elem <name> - print
out element <xsl:element name="name">
- -a or --attr <name> - add
attribute <xsl:attribute name="name">
- -b or --break -
break nesting
- -n or --nl -
print new line
- -s or --sort op xpath - sort in order (used
after -m) where op is X:Y:Z,
- X is A - for order="ascending"
- X is D - for order="descending"
- Y is N - for data-type="numeric"
- Y is T - for data-type="text"
- Z is U - for case-order="upper-first"
- Z is L - for case-order="lower-first"
Every -t option is mapped into XSLT template. Options after '-t' are mapped
into XSLT elements:
- -c to <xsl:copy-of>
- -v to <xsl:value-of>
- -m to <xsl:for-each>
- -e to <xsl:element>
- -a to <xsl:attribute>
- -s to <xsl:sort>
- -i to <xsl:if>
- etc.
There can be multiple --match, --copy-of, --value-of, etc options in a single
template, eg. xml sel -t -c "xpath0" -m "xpath1" -m "xpath2"
-v "xpath3" -t -m "xpath4" -c "xpath5"
In addition, some global options are available (xml sel <global-options>
{<template>} [ <xml-file> ... ]):
- -Q or --quiet -
do not write anything to standard output.
- -C or --comp -
display generated XSLT
- -R or --root -
print root element <xsl-select>
- -T or --text -
output is text (default is XML)
- -I or --indent -
indent output
- -D or --xml-decl -
do not omit xml declaration line
- -B or --noblanks -
remove insignificant spaces from XML tree
- -E or --encode <encoding> - output in the given encoding (utf-8,
unicode...)
- -N <name>=<value> -
predefine namespaces (name without 'xmlns:') ex: xsql=urn:oracle-xsql. Multiple
-N options are allowed.
- --net -
allow fetch DTDs or entities over network
- --help -
display help
To create XML on the fly:
echo "<x/>" | xml sel -t -m / -e xml -e child -a data -o
value → <xml><child data="value"/></xml>
To fetch a webpage, and display its title:
wget -qO - https://www.acme.com/feed.xml | xml.exe sel -B -t -m "//title"
To fetch a webpage, and display selected children:
wget -qO - https://www.acme.com/feed.xml | xml.exe sel -B -t -m "//item"
-v "title" -n -v "guid" -n -v "enclosure/@url"
-n -n
Formating
xml fo [<options>] <xml-file>
<options>:
- -n or --noindent -
do not indent
- -t or --indent-tab -
indent output with tabulation
- -s or --indent-spaces <num> - indent output with <num> spaces
- -o or --omit-decl -
omit xml declaration <?xml version="1.0"?>
- -R or --recover -
try to recover what is parsable
- -D or --dropdtd -
remove the DOCTYPE of the input docs
- -C or --nocdata -
replace cdata section with text nodes
- -N or --nsclean -
remove redundant namespace declarations
- -e or --encode <encoding> - output in the given encoding
(utf-8, unicode...)
- -H or --html -
input is HTML
- -h or --help -
print help
Editing
xml ed <global-options> {<action>} [ <xml-file-or-uri>
... ]
<global-options>:
- -P (or --pf) - preserve original
formatting
- -S (or --ps) - preserve non-significant
spaces
- -O (or --omit-decl) - omit XML declaration (<?xml ...?>)
- -N <name>=<value> - predefine namespaces (name
without 'xmlns:')
- ex:
xsql=urn:oracle-xsql
- Multiple
-N options are allowed.
- -N
options must be last global options.
- --help or -h - display help
<action>:
- -d or --delete <xpath>
- -i or --insert <xpath> -t (--type) elem|text|attr -n <name>
-v (--value) <value>
- -a or --append <xpath> -t (--type) elem|text|attr -n <name>
-v (--value) <value>
- -s or --subnode <xpath> -t (--type) elem|text|attr -n <name>
-v (--value) <value>
- -m or --move <xpath1> <xpath2>
- -r or --rename <xpath1> -v <new-name>
- -u or --update <xpath> -v (--value) <value>
- -x
(--expr) <xpath>
To remove elements: xml ed -d "/xml/table/rec[@id='2']" xml/table.xml
To move an element: echo '<x id="1"><a/><b/></x>'
| xml ed -m "//b" "//a"
To rename elements: xml ed -r "/xml/table/rec" -v record xml/tab-obj.xml
To rename attributes: xml ed -r "//*/@id" -v ID xml/tab-obj.xml
To update an element's value: xml ed -u "/xml/table/rec[@id=1]/numField"
-v 0 xml/tab-obj.xml
To update an attribute's value: xml ed -u "/xml/table/rec[@id=3]/@id"
-v 5 xml/tab-obj.xml
Resources