Forcing curl/wget to decompress gzip'd output
Forcing curl/wget to decompress gzip'd output
I was doing some troubleshooting (of this site no less), and was using the ever-useful curl
command to see what the webserver was sending.
However, at first, I got this output:
curl -so index.html https://www.nodinrogers.com # the '-s' is for silent, and the '-o' is for output
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
I'm expecting a HTML file, specifically, index.html, so thinking maybe something is squirrely with curl
, I try wget
instead:
curl -so index.html https://www.nodinrogers.com
Warning: Binary output can mess up your terminal. Use "--output -" to tell
wget -q https://www.nodinrogers.com # the '-q' is for quiet, so no output/progress is displayed
What kind of file did I download?
file index.html
index.html: gzip compressed data, original size modulo 2^32 109154
Then it dawns on me that the webserver, AWS in this case, is compressing the HTML as it's configured to do, as to save on bandwidth.
To de compress the data using curl/wget:
curl -so index.html --compressed https://www.nodinrogers.com
file index.html
index.html: HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators
wget --compression=auto -q https://www.nodinrogers.com
file index.html
index.html: HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators