Jump to content

Recommended Posts

Posted

Hi! Does anyone know a unix (command line) utility that will open a website, process its' JS and save HTML to file? And another one that will escape string to URL encoding? Neo?:D

Note: I have assignment to sript web-using(google translate) text translator as a shell... script. I could use G translate api, but it's paid in $20 increments

Posted

It's not a standard UNIX tool, but if you have 'curl' available, you can use it to send a request to a webserver. Just need to know the options that get passed to the API, and you can send the source and target language, and text to translate. Then you just need to parse the return output (take a look at awk).

 

There are a few shell scripts out there that use google translate that you can use as a basis (some don't work anymore, but they can lead you in the right direction).

Posted

1. After such long time, I consider curl a standard unix tool ;)

2. Thanks for the info, problem is: I can't use api, I'll just load normal website which leads to ...

3. Why I didn't googled for them before? I've found one that uses lynx to dump HTML source. Will examine it later.

Thanks

Posted

You don't have to use the API with curl, you can reference translate.google.com directly. I attempted to do this and came up with the following very minimal solution.

DEFAULT_SL='en'; ### Default source language: English
DEFAULT_TL='es'; ### Default target language: Spanish

source_lang=$DEFAULT_SL;
target_lang=$DEFAULT_TL;

while getopts "s:t:" ARGV; do
    case $ARGV in
        s) source_lang=$OPTARG;;
        t) target_lang=$OPTARG;;
        *) echo "Unrecognized option."; exit 127;;
    esac
done

shift $(( $OPTIND - 1 )) ### Remove the parsed switches


html=$( curl -s -i --user-agent "Mozilla/4.0" -d "sl=${source_lang}" -d "tl=${target_lang}" --data-urlencode "text=${1}" http://translate.google.com )
translation=$( echo $html | sed -ne 's#^.*<span id=result_box[^>]*><span[^>]*>\(.*\)</span></span>.*$#\1#p' )


echo "=> ${translation}"
Called as: translate [-s ] [-t ] "Text to translate"

 

The source and target language need to be the short form (i.e. en => English, es => Spanish, de => German, etc.).

 

So to translate German to English..

 

translate -s de -t en "Guten Morgen"

 

It uses curl to send data as a POST request to translate.google.com using the -d switches, so it's as if you were accessing it via your browser. The user agent must also be set or else Google will deny the translation.

 

The html output is then parsed via sed to grab only the translated text.

  • 3 weeks later...
Posted

Could you explain, what the sed script is doing? I know it's trimming html tags from curl output using regex, but what are these hashes for?

Posted

The hashes are acting as delimiters as opposed to using the default forward slash (/). So, you can do things like..

s#search#replace#
s@search@replace@
s;search;replace;

It's just habit that I tend to default to using #'s since I usually have to deal with forward slashes. If you use the default forward slash as a delimiter, then you have to escape them within the search and replace portions.

s/\/some\/path/\/new\/dir/
s#/some/path#/new/dir#

The second is easier to read.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines, Privacy Policy, and Terms of Use.