New: Pine 4.56 is out!
The newest patch is always available in the Pine 4.56 directory
The patch for patchlevel 7d-2 in this directory fixes the bounce command if send-charset was set, the chopped subject, handles all header fieds with assumed-charset and adds support for overlong rfc2047 encoded words.
Older patches for Pine 4.55
The well tested patch for 4.55 is in the
stable directory.
A newer patch
with send-charset
implemented by Jungshik Shin
is in the
experimental directory. Even newer is a patch in
the 4.56 directory
If there's a RPM subdirectory in these directories, they include rpm packages for SuSE 8.1 or newer (8.2) / equivalent (glibc-2.2 or higher)
Any contribution(suggestions, tips, patches, test cases) is welcome.
Yes, translation is done to the character-set configured in the pine config right now.
There could also be an issue of line breaking during printing which could be circumented by preventing line breaking for pinting and let the printer or print system do the line breaking.
Print should possibly be enhanced to convert to a separate character set for which the printer or print system is configured.
I have had time to test printing, and it works quite well! Character set conversion to UTF-8 happens. This is good in my case -- I am doing "attached print", and my emulator handles the UTF-8 codes in the print stream.
However, I can imagine that it would be a problem for other people who were writing directly to a printer. There are not many (any?) UTF-8 printers around. If the user had a printer that worked in Big5 (Chinese), and they usually got Big5 emails and printed them, then installing your patch would break things. I think the best solution is to create an additional configuration variable for "printer character set"; or maybe a flag associated with each printer definition.
(I do one other change for printing. In cmd_print, where it calls format_message, I add a flag FM_NOWRAP, so it doesn't break my lines based on the number of screen columns.)
If not printing to an printer attached to the terminal, it should be possible to use a character set filter as personal print command in Setup/Printers. For instance, you can use Juliusz Chroboczeck's cedilla has a good WGL4 coverage sufficient for Latin, Greek and Cyrillic text. Another good printer filter is uniprint included in Gapar Sinai's Yudit.
Personally selected print command The text to be printed will be piped into the command given here. The command is in the 2nd column, the printer name is in the first column. Som examples are: "prt", "lpr", "lp", or "enscript". The command may be given with options, for example "enscript -2 -r" or "lpr -Plpacc170". The commands and options on your system may be different from these examples.
Yes, they are thanks to Jungshik's work. Message headers properly
encoded compliant to
RFC 2047 have present
little problem. However, there are a lot of web mail services and mail
clients that send out untagged raw 8bit characters in the message headers.
Those characters are assumed to be in assumed-charset
and
converted to character-set
.
When you use the Export command, the same conversion is applied as is applied to the message display. is done. Export saves only the message headers currently in display. That is, all other headers not in view are not exported.
Saving (a) message(s) to a folder with Save is an entirely different thing. This operation preserves all information present in the message.
For reading local text or binary files, e.g. for attaching them to a outgoing message, no translation is applied.
The same holds for saving attachments, they should not be converted and should be preserved over the whole message pipeline for read and write.
Nothing in this area should be translated by default because a user should store files in the same charset as her file system charset
If neccesary, charset conversion might be implemented for reading/writing local files, but as system files can normally be converted by other programs as well, there is no need to it in the mail reader normally.
It is easy to write a shell or perl script to convert all files under a given directory to UTF-8. If you need it, I can check where it is.
Yes, if you use pine in an UTF-8 terminal and set charset to UTF-8 it is assumed that the keystrokes also arrive in UTF-8, like e.g. the xterm of XFree86 does.
Unfortunately, Pine's input and cursor movement does not yet support multibyte characters. When you type a UTF-8 character in Pico (Pine's default built-in editor) or in the message header lines, the cursor moves by two, three or four columns because Pine assumes that a single character takes a single byte to represent and it takes a single column width to display a single character represented in a single byte. Both assumptions don't hold for UTF-8 and multibyte charsets in general.
If the bytes of the multibyte characters are not received in quick succesion by the terminal, it may not display them correctly. Fortunately, you can refresh the screen by pressing Ctrl-L as you do in VI.
The cursor positioning is a real problem, however. As soon as you enter the first multibyte character into a line, pine will assume that the cursor has to move 2, 3 or 4 column widths. Likewise, if you step back with left arrow or backspace, you have to do it for every byte of the multibyte character. So this is a real pane.
So if you use multibyte characters, instead of using the default
editor Pico, you're advised to configure alternate-editor
to edit outgoing messages within the Pine session
Unfortunately Pine
does not support passing the subject and recipients into the
external text editor and taking them back when the external editor
exits. Therefore, for message headers (for instance, Subject),
you either have to use cut and paste or enter
the US-ASCII Charaters first and then, from right to left, insert
the multibyte characters.
Aprart from this, UTF-8 characters go out transparently from the editor to the mail and are labeled as charset UTF-8, when you set the characters-set setting in pine to "utf-8".
Jungshik's new patch makes it possible to convert outgoing mail
on reply/send and forward using a new config option sent-charset
and he's planning to make this configurabe at the time of
the message composition.
Bernhard Kaindl sent some info in this mail to pine-info
Unfortunately pico and the internal editor of pine are not internationalized.
There are several editors for Unix that support UTF-8 rather well. For instance, Vim 6.x is an excellent text editor with solid UTF-8 support. Emacs also supports UTF-8 and has interfaces to CJK input methods. Mike Fabian has put up a page about UTF-8 Internationalisation in GNU Emacs and XEmacs in his document on CJK(Chinese, Japanese, Korean) Support in SuSE Linux
Jungshik Shin developed the initial patch with the header conversion, charset and iconv aliases, and generic locale fixes For the message body conversion, he used display-filters.
Bernhard Kaindl updated the patch from 4.44 to 4.53, 4.55 and 4.56, added the message body conversion by writting an internal filter with iconv, cleaned up the code that fixed a bug and debugged some some other problems later.
On top of the last patch for 4.55, Jungshik Shin made additonal fixes for bugs
in the message body translation and implemented the send-charset
config option included in the iconv patch 7a and newer.
The send-charset
config option is partly based on a patch
from Eduardo Chappa who made it possible to
tag outgoing messages with the value of alt-character-set
instead of X-UNKNOWN when the charset is not recognized
by Pine.
Conceptually, send-charset
is rather different from
alt-character-set
patch because the former is not just for
tagging outgoing messages but also for converting the message headers
and the message body to send-charset
from
chararacter-set
(terminal / display charset).
Eduardo's patch is described in detail at his great web site which contains up-to-date information about Pine's latest features and is the definive source of Pine patches!
Many thanks to him for providing this excellent web site and patches!
Suggestions on further improvement are welcome. One of the next projects is to implement an control char filter which understands UTF-8 so it can filter all control characters which don't conform to the UTF-8 encoding without filtering UTF-8 bytes which look like control characters but are part of the UTF-8 encoding sequence.
There are still some rough edges and some features missing on which help is wanted:
Things which are wanted are:
pass-control-characters-as-is
obsolete.
Right now it has to be enabled to see many UTF-8 characters in header
fields. Should be fixed by implementing an UTF-8 control char filter
as described in the previous answer.
assumed-charset
per folder
override-charset
flag configurable per folder
send-charset
and assumed-charset
to the list of options that can be configurable per role.
The definition of UTF-8(UCS/Unicode Transformation Format 8) is found in Unicode and ISO 10646. A draft RFC on UTF-8 submitted to IETF is also a good reference. One of numerous implementations is found in Mozilla.
A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX is an excellent resource (with many screenshots) if you are looking for UTF-8 terminal emulators, editors, conversion and printing utilities, and fonts!
Last Updated 2003-06-08