Some unix utils

SSL · April 26, 2013

Hi! Does anyone know a unix (command line) utility that will open a website, process its' JS and save HTML to file? And another one that will escape string to URL encoding? Neo?:D

Note: I have assignment to sript web-using(google translate) text translator as a shell... script. I could use G translate api, but it's paid in $20 increments

stoppingby4now · April 27, 2013

It's not a standard UNIX tool, but if you have 'curl' available, you can use it to send a request to a webserver. Just need to know the options that get passed to the API, and you can send the source and target language, and text to translate. Then you just need to parse the return output (take a look at awk).

There are a few shell scripts out there that use google translate that you can use as a basis (some don't work anymore, but they can lead you in the right direction).

SSL · April 28, 2013

1. After such long time, I consider curl a standard unix tool ;)

2. Thanks for the info, problem is: I can't use api, I'll just load normal website which leads to ...

3. Why I didn't googled for them before? I've found one that uses lynx to dump HTML source. Will examine it later.

Thanks

stoppingby4now · April 29, 2013

You don't have to use the API with curl, you can reference translate.google.com directly. I attempted to do this and came up with the following very minimal solution.

DEFAULT_SL='en'; ### Default source language: English
DEFAULT_TL='es'; ### Default target language: Spanish

source_lang=$DEFAULT_SL;
target_lang=$DEFAULT_TL;

while getopts "s:t:" ARGV; do
Â Â Â  case $ARGV in
Â Â Â Â Â Â Â  s) source_lang=$OPTARG;;
Â Â Â Â Â Â Â  t) target_lang=$OPTARG;;
Â Â Â Â Â Â Â  *) echo "Unrecognized option."; exit 127;;
Â Â Â  esac
done

shift $(( $OPTIND - 1 )) ### Remove the parsed switches


html=$( curl -s -i --user-agent "Mozilla/4.0" -d "sl=${source_lang}" -d "tl=${target_lang}" --data-urlencode "text=${1}" http://translate.google.com )
translation=$( echo $html | sed -ne 's#^.*<span id=result_box[^>]*><span[^>]*>\(.*\)</span></span>.*$#\1#p' )


echo "=> ${translation}"

Called as: translate [-s ] [-t ] "Text to translate"

The source and target language need to be the short form (i.e. en => English, es => Spanish, de => German, etc.).

So to translate German to English..

translate -s de -t en "Guten Morgen"

It uses curl to send data as a POST request to translate.google.com using the -d switches, so it's as if you were accessing it via your browser. The user agent must also be set or else Google will deny the translation.

The html output is then parsed via sed to grab only the translated text.

SSL · May 15, 2013

Could you explain, what the sed script is doing? I know it's trimming html tags from curl output using regex, but what are these hashes for?

stoppingby4now · May 18, 2013

The hashes are acting as delimiters as opposed to using the default forward slash (/). So, you can do things like..

s#search#replace#
s@search@replace@
s;search;replace;

It's just habit that I tend to default to using #'s since I usually have to deal with forward slashes. If you use the default forward slash as a delimiter, then you have to escape them within the search and replace portions.

s/\/some\/path/\/new\/dir/
s#/some/path#/new/dir#

The second is easier to read.

Sign In

Some unix utils

Recommended Posts

SSL

stoppingby4now

SSL

stoppingby4now

SSL

stoppingby4now

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Forums

Forum Activity

Game Guides

Nexus

Important Information