Disclaimer
My basic premise is that you want to make a throwaway script for exploring or monitoring a JSON REST API and you want to be able to pull the value of a JSON property to feed into the next request, to log or to make a decision about what to do next. In general, if you want to make a maintainable, robust and efficient system which needs to do structured text parsing, you should use a purpose built 3rd party library or framework like
jq.
Assumptions
For the purposes of this example, I am making the reasonable assumptions that you have access to a Bash shell, Curl, Python 2.6+ and an up to date version of GNU Grep (OSX users will need to
upgrade from the BSD version using Brew).
Extracting JSON Data
An example of an API that returns JSON in the response body is the Way Back When Machine REST API. I'll use a request to get the archive data for example.com in this example:
$ curl http://archive.org/wayback/available?url=example.com 2> /dev/null
Which responds with some archive metadata:
{"archived_snapshots":{"closest":{"available":true,"url":"http://web.archive.org/web/20150119140634/http://me@example.com/","timestamp":"20150119140634","status":"200"}}}
Let's say that it's important to me to have a script make decisions on what to do next depending on the value of "status" returned in the response. It would be hard to use grep to extract this data without formatting the JSON first. This can be achieved using the Python JSON formatter as follows:
$ curl http://archive.org/wayback/available?url=example.com 2> /dev/null | python -m json.tool
{
"archived_snapshots": {
"closest": {
"available": true,
"status": "200",
"timestamp": "20150119140634",
"url": "http://web.archive.org/web/20150119140634/http://me@example.com/"
}
}
}
It is then possible to pull out the status itself using a Grep Perl regular expression (-P) and printing out only the matching part of the line (-o). The following regular expression matches any character preceded by "status": " which is not a newline, a comma or a double quote:
(?<="status": ")[^\n,"]*:
Piping it together:
$ curl http://archive.org/wayback/available?url=example.com 2>/dev/null | python -m json.tool | grep -Po '(?<="status": ")[^\n,"]*'
Gives result:
200
If I wanted to extract the value of a numeric or boolean type property I'd have to adjust the expression a little bit by removing the double quote on the end of the initial matcher:
$ curl http://archive.org/wayback/available?url=example.com 2>/dev/null | python -m json.tool | grep -Po '(?<="available": )[^\n,"]*'
Resulting in:
true
To complete the example, the result can be stored directly into a variable for use later in the script:
$ export STATUS=$(curl http://archive.org/wayback/available?url=example.com 2> /dev/null | python -m json.tool | grep -Po '(?<="status": ")[^\n,"]*')
if [[ $STATUS == '200' ]]; then
echo 'OK'
fi