SSL Posted April 26, 2013 Posted April 26, 2013 Hi! Does anyone know a unix (command line) utility that will open a website, process its' JS and save HTML to file? And another one that will escape string to URL encoding? Neo?:D Note: I have assignment to sript web-using(google translate) text translator as a shell... script. I could use G translate api, but it's paid in $20 increments
stoppingby4now Posted April 27, 2013 Posted April 27, 2013 It's not a standard UNIX tool, but if you have 'curl' available, you can use it to send a request to a webserver. Just need to know the options that get passed to the API, and you can send the source and target language, and text to translate. Then you just need to parse the return output (take a look at awk). There are a few shell scripts out there that use google translate that you can use as a basis (some don't work anymore, but they can lead you in the right direction).
SSL Posted April 28, 2013 Author Posted April 28, 2013 1. After such long time, I consider curl a standard unix tool ;) 2. Thanks for the info, problem is: I can't use api, I'll just load normal website which leads to ... 3. Why I didn't googled for them before? I've found one that uses lynx to dump HTML source. Will examine it later. Thanks
stoppingby4now Posted April 29, 2013 Posted April 29, 2013 You don't have to use the API with curl, you can reference translate.google.com directly. I attempted to do this and came up with the following very minimal solution.DEFAULT_SL='en'; ### Default source language: English DEFAULT_TL='es'; ### Default target language: Spanish source_lang=$DEFAULT_SL; target_lang=$DEFAULT_TL; while getopts "s:t:" ARGV; do    case $ARGV in        s) source_lang=$OPTARG;;        t) target_lang=$OPTARG;;        *) echo "Unrecognized option."; exit 127;;    esac done shift $(( $OPTIND - 1 )) ### Remove the parsed switches html=$( curl -s -i --user-agent "Mozilla/4.0" -d "sl=${source_lang}" -d "tl=${target_lang}" --data-urlencode "text=${1}" http://translate.google.com ) translation=$( echo $html | sed -ne 's#^.*<span id=result_box[^>]*><span[^>]*>\(.*\)</span></span>.*$#\1#p' ) echo "=> ${translation}"Called as: translate [-s ] [-t ] "Text to translate" The source and target language need to be the short form (i.e. en => English, es => Spanish, de => German, etc.). So to translate German to English.. translate -s de -t en "Guten Morgen" It uses curl to send data as a POST request to translate.google.com using the -d switches, so it's as if you were accessing it via your browser. The user agent must also be set or else Google will deny the translation. The html output is then parsed via sed to grab only the translated text.
SSL Posted May 15, 2013 Author Posted May 15, 2013 Could you explain, what the sed script is doing? I know it's trimming html tags from curl output using regex, but what are these hashes for?
stoppingby4now Posted May 18, 2013 Posted May 18, 2013 The hashes are acting as delimiters as opposed to using the default forward slash (/). So, you can do things like..s#search#replace# s@search@replace@ s;search;replace;It's just habit that I tend to default to using #'s since I usually have to deal with forward slashes. If you use the default forward slash as a delimiter, then you have to escape them within the search and replace portions.s/\/some\/path/\/new\/dir/ s#/some/path#/new/dir#The second is easier to read.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now