Xidel is a command line tool to download and extract data from HTML/XML pages as well as JSON APIs .
form
and match
functions, improves the Windows command-line interface, merges the two old cgi services to a single one and fixes several interpreter bugsxidel http://www.google.de/search?q=test --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"
xidel http://www.google.de/search?q=test --follow "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']" --extract //title --download '{$host}/'
xidel http://example.org -f //a -e //title
xidel http://example.org -f "css('a')" --css title
xidel http://example.org -f "<a>{.}</a>*" -e "<title>{.}</title>"
<x><foo>ood</foo><bar>IMPORTANT!</bar></x>
xidel example.xml -e "<x><foo>ood</foo><bar>{.}</bar></x>"
xidel -e "(1 + 2 + 3) * 1000000000000 + 4 + 5 + 6 + 7.000000008"
xidel http://stackoverflow.com/feeds -e "<entry><title>{title:=.}</title><link>{uri:=@href}</link></entry>+"
xidel "http://www.reddit.com/user/username/" --extract "<t:loop><div class='usertext-body'><div>{outer-xml(.)}</div></div><ul class='flat-list buttons'><a><t:s>link:=@href</t:s>permalink</a></ul></div></div></t:loop>" --follow "<a rel='nofollow next'>{.}</a>?"
xidel http://reddit.com -f "form(css('form.login-form')[1], {'user': '$your_username', 'passwd': '$your_password'})" -e "css('#mail')/@title"
xidel -d "user=$your_username&passwd=$your_password&api_type=json" https://ssl.reddit.com/api/login --method GET 'http://www.reddit.com/api/me.json' -e '($json).data.has_mail'
xidel --xquery "<table>{for $i in 1 to 1000 return <tr><td>{$i}</td><td>{if ($i mod 2 = 0) then 'even' else 'odd'}</td></tr>}</table>" --output-format xml
xidel --xquery '<table>{for $i in 1 to 1000 return <tr><td>{$i}</td><td>{if ($i mod 2 = 0) then "even" else "odd"}</td></tr>}</table>' --output-format xml
eval "$(xidel http://site -e 'title:=//title' -e 'links:=//a/@href' --output-format bash)"
FOR /F "delims=" %%A IN ('xidel http://site -e "title:=//title" -e "links:=//a/@href" --output-format cmd') DO %%A
xidel file.json -e '$json(10)'
xidel file.json -e '$json()'
xidel file.json -e '$json("foo")("bar")'
xidel file.json -e '($json).foo.bar'
xidel file.json -e '$json/foo/bar'
xidel file.json -e '$json("abc")()().xyz/(u,v)'
{"abc": [[{"xyz": {"u": 1, "v": 2}}], [{"xyz": {"u": 3}}, {"xyz": {"u": 4}} ]]}
.()
xidel http://site -e '//tr / string-join(td, ",")'
string-join((...))
can generally be used to output some values in a single line.
In the example tr / string-join
calls string-join
for every row.
xidel --html your-file.html --xquery "x:replace-nodes(/, //a, function($e) {
$e/<a style='{string-join((@style, 'font-weight: bold'), '; ')}'>{@* except @style, node()}</a>
else .
})" > your-output-file.html
Linux/Powershell: xidel --html your-file.html --xquery 'x:replace-nodes(/, //a, function($e) {
$e/<a style="{string-join((@style, "font-weight: bold"), "; ")}">{@* except @style, node()}</a>
})' > your-output-file.html
x:replace-nodes(/, //a, function($e) { .. }
: This applies an anonymous function to every link a
-element in the HTML document, whereby that element is stored in the variable $e
and is replaced by the return value of the function.<a>{@* except @style, node()}</a>
: This creates a new a
-element that has the same children, descendants and attributes as the current element, but removes the style
-attribute.style="{string-join((@style, "font-weight: bold"), "; ")}"
: This creates a new style
-attribute by appending "font-weight: bold"
to the old value of the attribute. A separating "; "
is inserted, if (and only if) that attribute already existed.Operating System | Filename | Size | SHA-256 |
---|---|---|---|
Windows: 32 Bit | xidel-0.9.8.win32.zip | 840.0 kB | 96854c2be1e3755f56fabb8f00d1fe567108461b9fab139039219a1b7c17e382 |
Windows: 32 Bit (needs OpenSSL) | xidel-0.9.8-openssl.win32.zip | 873.6 kB | 1b9f3e78897727fe3ea2a359ec9678d0b2e593792a3c10c468bec60d7a873b59 |
Universal Linux: 32 Bit | xidel-0.9.8.linux32.tar.gz | 848.5 kB | dcc80b3a1dbf437c98d94c8dcd9b4af5f709174892bf926f36ea8dd5cb55aaec |
Universal Linux: 64 Bit | xidel-0.9.8.linux64.tar.gz | 1.3 MB | cf6d7391a73dbadf7c74e22206ea3f9f4f77f77d0e9d6e32d15ec400b1b843ef |
Debian: 32 Bit | xidel_0.9.8-1_i386.deb | 665.1 kB | 8329c02512da430ef1f40f77e2676539a146b258c7201375337e7de8f4e16b2c |
Debian: 64 Bit | xidel_0.9.8-1_amd64.deb | 991.9 kB | f6a6e29b77547d5ae38383440bd653b3eaf9eeb470def14cc48154a4f6925f69 |
Android ARM: | xidel-0.9.8.androidarm.tar.gz | 2.1 MB | 3d19cf5e9a5bf9314e251aa14e0ac990fdd290aa5bfad9e0e5c6956800365fb5 |
Source: | xidel-0.9.8.src.tar.gz | 1.9 MB | 72b5b1a2fc44a0a61831e268c45bc6a6c28e3533b5445151bfbdeaf1562af39c |
Mac 10.8 | externally prebuilt version and compile instructions. |
http://www.videlibri.de/cgi-bin/xidelcgi?data=<html><title>foobar</title></html>&extract=//title&raw=true
../build.sh
, which just calls FreePascal. If you want to FreePascal directly yourself, you can use fpc xidel.pas
in which case you need to pass the paths to all directories of the source using the -Fu
, -Fi
options.components/pascal/internettools.lpk
and components/pascal/internettools_utf8.lpk
in Lazarus, then open programs/internet/xidel/xidel.lpi
and click on Run\Compile.There is also a Greasemonkey 3 script to automatically generate templates by selecting the interesting values on a webpage. The script intercepts the selection and marks the elements as shown in the screenshots below:
The script was written for Greasemonkey 3. Beware that it will not work properly in Greasemonkey 4.
You can find the script in the mercurial repository or on userscripts.org (mirrored as userscripts is dead) with a detailed description. You need to change the name to "Webscraper / Xidelscript" (it is a multiscript that changes its behaviour depending on its name))