Jump to content

Some unix utils


SSL
 Share

Recommended Posts

Hi! Does anyone know a unix (command line) utility that will open a website, process its' JS and save HTML to file? And another one that will escape string to URL encoding? Neo?:D

Note: I have assignment to sript web-using(google translate) text translator as a shell... script. I could use G translate api, but it's paid in $20 increments

Link to comment
Share on other sites

It's not a standard UNIX tool, but if you have 'curl' available, you can use it to send a request to a webserver. Just need to know the options that get passed to the API, and you can send the source and target language, and text to translate. Then you just need to parse the return output (take a look at awk).

 

There are a few shell scripts out there that use google translate that you can use as a basis (some don't work anymore, but they can lead you in the right direction).

Link to comment
Share on other sites

1. After such long time, I consider curl a standard unix tool ;)

2. Thanks for the info, problem is: I can't use api, I'll just load normal website which leads to ...

3. Why I didn't googled for them before? I've found one that uses lynx to dump HTML source. Will examine it later.

Thanks

Link to comment
Share on other sites

You don't have to use the API with curl, you can reference translate.google.com directly. I attempted to do this and came up with the following very minimal solution.

DEFAULT_SL='en'; ### Default source language: English
DEFAULT_TL='es'; ### Default target language: Spanish

source_lang=$DEFAULT_SL;
target_lang=$DEFAULT_TL;

while getopts "s:t:" ARGV; do
    case $ARGV in
        s) source_lang=$OPTARG;;
        t) target_lang=$OPTARG;;
        *) echo "Unrecognized option."; exit 127;;
    esac
done

shift $(( $OPTIND - 1 )) ### Remove the parsed switches


html=$( curl -s -i --user-agent "Mozilla/4.0" -d "sl=${source_lang}" -d "tl=${target_lang}" --data-urlencode "text=${1}" http://translate.google.com )
translation=$( echo $html | sed -ne 's#^.*<span id=result_box[^>]*><span[^>]*>\(.*\)</span></span>.*$#\1#p' )


echo "=> ${translation}"
Called as: translate [-s ] [-t ] "Text to translate"

 

The source and target language need to be the short form (i.e. en => English, es => Spanish, de => German, etc.).

 

So to translate German to English..

 

translate -s de -t en "Guten Morgen"

 

It uses curl to send data as a POST request to translate.google.com using the -d switches, so it's as if you were accessing it via your browser. The user agent must also be set or else Google will deny the translation.

 

The html output is then parsed via sed to grab only the translated text.

Link to comment
Share on other sites

  • 3 weeks later...

The hashes are acting as delimiters as opposed to using the default forward slash (/). So, you can do things like..

s#search#replace#
s@search@replace@
s;search;replace;

It's just habit that I tend to default to using #'s since I usually have to deal with forward slashes. If you use the default forward slash as a delimiter, then you have to escape them within the search and replace portions.

s/\/some\/path/\/new\/dir/
s#/some/path#/new/dir#

The second is easier to read.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines, Privacy Policy, and Terms of Use.