SYNOPSIS

konwert FILTER [FILE].\|.\|.\| [-o DEST | -O]

DESCRIPTION

Konwert allows filtering multiple files through multiple filters. It filters the specified FILEs, or stdin if none are given.

Simple FILTER is the name of an executable file from the directory ~/.konwert/filters or the system-wide one, normally /usr/share/konwert/filters. Such program itself filters stdin to stdout.

The filtering rule can be more complex:

konwert FILTER1+FILTER2 means konwert FILTER1 | konwert FILTER2.

konwert FORMAT1-FORMAT2, unless such filter exists, tries to find a common FORMAT3, such that both filters FORMAT1-FORMAT3 and FORMAT3-FORMAT1 do exist.

konwert FILTER/ARG/.\|.\|.\| passes arguments to the filter. Arguments can also be specified here: FORMAT1/ARGS-FORMAT2. The meaning of arguments depends on the particular filter.

konwert '(COMMAND ARGS.\|.\|.\|)' executes this arbitrary shell command. This is useful with -o or -O options. The command cannot contain the string )+, which will terminate this filter's specification.

OPTIONS

-o DEST

output goes to this file/directory instead of stdout

-O

every input file is replaced with its translation

--help

display help and exit

--version

output version information and exit

Redirecting output to one of the source files with either -o or > instead of -O will corrupt it! Option -O creates a temporary file in /tmp and later copies it back onto the source.

CHARACTER ENCODING CONVERSIONS

You can convert text between any two charsets, for example konwert cp437-iso2.

Characters unavailable in the target charset will be substituted with approximations with available ones. The approximations need not be single characters.

The following character sets are currently supported:

ascii

7bit ASCII

utf8 = unicode

Unicode UTF-8

iso1 = isolatin1

ISO-8859-1 aka ISO Latin 1 (Western European)

iso2 = isolatin2

ISO-8859-2 aka ISO Latin 2 (Central European)

iso3 = isolatin3

ISO-8859-3 aka ISO Latin 3 (Esperanto)

iso4 = isolatin4

ISO-8859-4 aka ISO Latin 4 (Baltic)

iso5 = isolatincyr

ISO-8859-5 (Cyrillic)

iso6 = isolatinarabic

ISO-8859-6 (Arabic)

iso7 = isolatingreek

ISO-8859-7 (Greek)

iso8 = isolatinhebrew

ISO-8859-8 (Hebrew)

iso9 = isolatin5 = isolatintur

ISO-8859-9 aka ISO Latin 5 (Turkish)

iso10 = isolatin6 = isolatinnordic

ISO-8859-10 aka ISO Latin 6 (Nordic)

iso12 = isolatin7 = isolatinceltic

ISO-8859-12 aka ISO Latin 6 (Celtic) - Draft

iso13 = isolatin8 = isolatinbaltic

ISO-8859-13 aka ISO Latin 6 (Baltic) - Draft

iso14 = isolatin9 = isolatinsami

ISO-8859-14 aka ISO Latin 6 (S�mi) - Draft

iso15

ISO-8859-15 - Draft

koi8r

KOI8-R (Russian)

koi8u

KOI8-U (Ukrainian, Byelorussian)

koi8uni

KOI8-Uni (Cyrillic)

cp1250 = wince = winlatin2

Windows CP-1250 aka Win Latin 2 (Central European)

cp1251 = wincyr

Windows CP-1251 (Cyrillic)

cp1252 = winwest = winlatin1

Windows CP-1252 aka Win Latin 1 (Western European)

cp1253 = wingr

Windows CP-1253 (Greek)

cp1254 = wintur

Windows CP-1254 (Turkish)

cp1255 = winhebrew

Windows CP-1255 (Hebrew)

cp1256 = winarabic

Windows CP-1256 (Arabic)

cp1257 = winbaltic

Windows CP-1257 (Baltic)

cp1258 = winviet

Windows CP-1258 (Vietnamese)

cp437 = icmeng

DOS CP-437 (English)

cp737 = dosgreek

DOS CP-737 (Greek)

cp775 = dosbaltic

DOS CP-775 (Baltic)

cp850 = doswest = doslatin1

DOS CP-850 aka DOS Latin 1 (Western European)

cp852 = dosce = doslatin2

DOS CP-852 aka DOS Latin 2 (Central European)

cp855 = doscyr

DOS CP-855 (Cyrillic)

cp857 = dostur

DOS CP-857 (Turkish)

cp860 = dosportugal

DOS CP-860 (Portugal)

cp861 = dosiceland

DOS CP-861 (Icelandic)

cp862 = doshebrew

DOS CP-862 (Hebrew)

cp863 = doscanadfr

DOS CP-863 (Canadian French)

cp864 = dosarabic

DOS CP-864 (Arabic)

cp865 = dosnordic

DOS CP-865 (Nordic)

cp866 = dosrussian

DOS CP-866 (Russian)

cp869 = dosgreek2

DOS CP-869 (Greek2)

cp874 = dosthai

DOS CP-874 (Thai)

mac

Macintosh Roman (Western European)

macce

Macintosh Central European

maccyr

Macintosh Cyrillic

macgreek

Macintosh Greek

maciceland

Macintosh Icelandic

mactur

Macintosh Turkish

csk,

cyfromat,

dhn,

fidomazovia,

iea,

logic,

mazovia,

microvex

DOS charsets for Polish

amigapl,

fat,

xjp

Amiga charsets for Polish

kamenicky

DOS charset for Czech and Slovak

wingreek

WinGreek (Windows font-based encoding for ancient Greek)

babelpl

TeX [polish]{babel}: "a"c"e"l"n"o"s"z"r

ciachy

TeX \prefixing: /a/c/e/l/n/o/s/x/z

xmetodo

Esperanto: cx gx hx jx sx ux (vxw)

hmetodo

Esperanto: ch gh hh jh sh u

antauxcxap

Esperanto: ^c ^g ^h ^j ^s ^u (~u)

postcxap

Esperanto: c^ g^ h^ j^ s^ u^ (u~)

apostrofoj

Esperanto: c' g' h' j' s' u'

malapostrofoj

Esperanto: c` g` h` j` s` u`

viscii

VISCII (Vietnamese)

viqri

Vietnamese Quoted Readable Implicit

htmldec

SGML/HTML character references (decimal): Æ ě →

htmlhex

SGML/HTML character references (hexadecimal): Æ ě →

htmlent

SGML/HTML character entities (names): Æ &ecaron →

html

All three above (only as input format)

tex

TeX with some LaTeX or AMS-TeX extensions. There is no distinction between normal and math mode - you will probably have to insert some $'s manually.

mnemonic

RFC 1345 mnemonics preceded by &

mnemonic1

RFC 1345 mnemonics preceded by `

any/LANGUAGE (e.g. any/pl-iso2)

This special input format will detect the encoding automatically, basing on the frequencies of characters found in text. Every language is associated with a set of possible encodings used for it and average frequencies of its letters (excluding ASCII letters). The best fitting encoding is used for conversion. Currently supported languages are cs (Czech), de (German), el (Greek), eo (Esperanto), es (Spanish), fr (French), he (Hebrew), it (Italian), pl (Polish), pt (Portuguese), ru (Russian), and sv (Swedish).

varpl

Mixed Polish ISO-8859-2, CP-1250, and UTF-8. If you are reading Polish newsgroups I suggest putting it as a filter in your newsreader (for speed improvement it's better to call it directly, rather than through konwert).

vareo

Mixed various Esperanto encodings.

OPTIONS CONTROLLING THE ABOVE CONVERSIONS

/1 (e.g. konwert iso2-ascii/1)

Each unavailable character will be replaced only with a single approximate char, not string. This is useful with the filterm program or with preformatted text. This option is automatically turned on when a filter is used as output for filterm.

/html

Text is assumed to be HTML. The characters " & < > resulting from other characters' approximations will be properly escaped as " & < >. The <META http-equiv="content-type" content="text/html; charset=.\|.\|.\|"> header will be fixed if present.

/htmldec

Convert META as above. Unavailable characters will be encoded in &#Unicode;.

/htmlhex

Convert META as above. Unavailable characters will be encoded in hexadecimal &#xUnicode;.

/tex

Unavailable characters will be described in TeX. Characters # $ % & \nbsp;^ _ { | } ~ resulting from some characters' approximations will be properly escaped into \\# \\$ \\ \\& $\\backslash$ \\^{} \\_ \\{ $|$ \\} \\\\~{}.

/asciichar

Recognizes some ASCII representations of characters, e.g.\| (c) .\|.\|.\| 1/2 >=.

/rosyjski

Russian text will be replaced with its Polish phonetic transcription.

Some output filters can use the language information for choosing better approximations of unavailable letters, for example /de (German): ä \(-> ae instead of a.

OTHER FILTERS

any/LANGUAGE-test

Detects the encoding, but instead of text conversion only shows the encoding's name. The additional option /all shows all possible encodings, sorted from better to worse ones.

cr

lf

crlf

Force specific end-of-line marker convention. cr = Macintosh, lf = Unix and Amiga, crlf = Windows and DOS. The input convention is detected automatically.

expand

Expands tabs into spaces (uses the textutils program expand).

unexpand

Compresses spaces into tabs (uses the textutils program unexpand).

rmspacesateol

Removes spaces and tabs at end of line.

qp-8bit

8bit-qp

MIME Quoted Printable encoding: =A3=F3d=BC.

rtf-8bit

8bit-rtf

Rich Text Format: \\\\'a3\\\\'f3d\\\\'9f.

txt-htmlchar

Escapes " & < > into SGML/HTML entities " & < >. Useful for including a text file inside HTML <PRE> </PRE> tags.

htmlchar-txt

Reverse.

rot13

Guvf vf n qrzbafgengvba bs ebg13.

toupper

tolower

Self-explanatory. Currently ASCII only.

prn7pl

Converts polish chars to control sequences for EPSON-compatible printer. Using only 7-bit chars, backspacing printer's head and vertical positioning chars ,.'` it creates pseudo-polish gryphs. You can specify options: /nlq (default) which optimalizes output for better quality printers and /draft - useful for ex. for 9-nails printer.

FILES

/usr/share/konwert/filters/*

~/.konwert/filters/*

RELATED TO konwert…

BUGS

APPLE character in mac* charsets, and CH and ch characters in koi8cs are not preserved in conversion even when they are available. Also they don't respect the /1 option. Reason: they are not in Unicode.

COPYRIGHT

Konwert is a package for conversion between various character encodings.

Copyright (c) 1998 Marcin 'Qrczak' Kowalczyk

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

AUTHOR

 __("<   Marcin Kowalczyk * [email protected] http://qrczak.home.ml.org/
 \__/       GCS/M d- s+:-- a21 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++
  ^^                W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
QRCZAK                  5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-