CHARMAP(7)           Standards, Environments, and Macros          CHARMAP(7)
NAME
       charmap - character set description file
DESCRIPTION
       A character set description file or 
charmap defines characteristics
       for a coded character set. Other information about the coded
       character set may also be in the file. Coded character set character
       values are defined using symbolic character names followed by
       character encoding values.
       The character set description file provides:
           o      The capability to describe character set attributes (such
                  as collation order or character classes) independent of
                  character set encoding, and using only the characters in
                  the portable character set. This makes it possible to
                  create generic 
localedef(1) source files for all codesets
                  that share the portable character set.
           o      Standardized symbolic names for all characters in the
                  portable character set, making it possible to refer to any
                  such character regardless of encoding.
   Symbolic Names
       Each symbolic name  is included in the file and is mapped to a unique
       encoding value (except for those symbolic names that are shown with
       identical glyphs).  If the control characters commonly associated
       with the symbolic names in the following table are supported by the
       implementation, the symbolic names and their corresponding encoding
       values are included in the file. Some of the encodings associated
       with the symbolic names in this table may be the same as characters
       in the portable character set table.
       +----------------------------------------------+
       |<ACK>   <DC2>   <ENQ>   <FS>    <IS4>   <SOH> |
       |<BEL>   <DC3>   <EOT>   <GS>    <LF>    <STX> |
       |<BS>    <DC4>   <ESC>   <HT>    <NAK>   <SUB> |
       |<CAN>   <DEL>   <ETB>   <IS1>   <RS>    <SYN> |
       |<CR>    <DLE>   <ETX>   <IS2>   <SI>    <US>  |
       |<DC1>   <EM>    <FF>    <IS3>   <SO>    <VT>  |
       +----------------------------------------------+
   Declarations
       The following declarations can precede the character definitions.
       Each must consist of the symbol shown in the following list, starting
       in column 1, including the surrounding brackets, followed by one or
       more blank characters, followed by the value to be assigned to the
       symbol.       
<code_set_name>
                          The name of the coded character set for which the
                          character set description file is defined.       
<mb_cur_max>
                          The maximum number of bytes in a multi-byte
                          character. This defaults to 
1.       
<mb_cur_min>
                          An unsigned positive integer value that defines
                          the minimum number of bytes in a character for the
                          encoded character set.       
<escape_char>
                          The escape character used to indicate that the
                          characters following will be interpreted in a
                          special way, as defined later in this section.
                          This defaults to backslash ('
\'), which is the
                          character glyph used in all the following text and
                          examples, unless otherwise noted.       
<comment_char>
                          The character that when placed in column 1 of a
                          charmap line, is used to indicate that the line is
                          to be ignored. The default character is the number
                          sign (#).
   Format
       The character set mapping definitions will be all the lines
       immediately following an identifier line containing the string       
CHARMAP starting in column 1, and preceding a trailer line containing
       the string 
END CHARMAP starting in column 1. Empty lines and lines
       containing a 
<comment_char> in the first column will be ignored. Each
       non-comment line of the character set mapping definition, that is,
       between the 
CHARMAP and 
END CHARMAP lines of the file), must be in
       either of two forms:         
"%s %s %s\n",<
symbolic-name>,<
encoding>,<
comments>
       or         
"%s...%s %s %s\n",<
symbolic-name>,<
symbolic-name>, <
encoding>,\
                  <
comments>
       In the first format, the line in the character set mapping definition
       defines a single symbolic name and a corresponding encoding. A
       character following an escape character is interpreted as itself; for
       example, the sequence "<
\\\>>" represents the symbolic name "
\>"
       enclosed between angle brackets.
       In the second format, the line in the character set mapping
       definition defines a range of one or more symbolic names. In this
       form, the symbolic names must consist of zero or more non-numeric
       characters,  followed by an integer formed by one or more decimal
       digits. The characters preceding the integer must be identical in the
       two symbolic names, and the integer formed by the digits in the
       second symbolic name must be equal to or greater than the integer
       formed by the digits in the first name. This is interpreted as a
       series of symbolic names formed from the common part and each of the
       integers between the first and the second integer, inclusive. As an
       example, 
<j0101>...<j0104> is interpreted as the symbolic names       
<j0101>, 
<j0102>, 
<j0103>, and 
<j0104>, in that order.
       A character set mapping definition line must exist for all symbolic
       names and must define the coded character value that corresponds to
       the character glyph indicated in the table, or the coded character
       value that corresponds with the control character symbolic name. If
       the control characters commonly associated with the symbolic names
       are supported by the implementation, the symbolic name and the
       corresponding encoding value must be included in the file. Additional
       unique symbolic names may be included. A coded character value can be
       represented by more than one symbolic name.
       The encoding part is expressed as one (for single-byte character
       values) or more concatenated decimal, octal or hexadecimal constants
       in the following formats:         
"%cd%d",<
escape_char>,<
decimal byte value>         
"%cx%x",<
escape_char>,<
hexadecimal byte value>         
"%c%o",<
escape_char>,<
octal byte value>
   Decimal Constants
       Decimal constants must be represented by two or three decimal digits,
       preceded by the escape character and the lower-case letter 
d; for
       example, 
\d05, 
\d97, or 
\d143. Hexadecimal constants must be
       represented by two hexadecimal digits, preceded by the escape
       character and the lower-case letter 
x; for example, 
\x05, 
\x61, or       
\x8f. Octal constants must be represented by two or three octal
       digits, preceded by the escape character; for example, 
\05, 
\141, or       
\217. In a portable charmap file, each constant must represent an
       8-bit byte. Implementations supporting other byte sizes may allow
       constants to represent values larger than those that can be
       represented in 8-bit bytes, and to allow additional digits in
       constants. When constants are concatenated for multi-byte character
       values, they must be of the same type, and interpreted in byte order
       from first to last with the least significant byte of the multi-byte
       character specified by the last constant.
   Ranges of Symbolic Names
       In lines defining ranges of symbolic names, the encoded value is the
       value for the first symbolic name in the range (the symbolic name
       preceding the ellipsis). Subsequent symbolic names defined by the
       range will have encoding values in increasing order. Bytes are
       treated as unsigned octets and carry is propagated between the bytes
       as necessary to represent the range. However, because this causes a
       null byte in the second or subsequent bytes of a character, such a
       declaration should not be specified. For example, the line
         <j0101>...<j0104>     \d129\d254
       is interpreted as:
         <j0101>                \d129\d254
         <j0102>                \d129\d255
         <j0103>                \d130\d00
         <j0104>                \d130\d01
       The expanded declaration of the symbol 
<j0103> in the above example
       is an invalid specification, because it contains a null byte in the
       second byte of a character.
       The comment is optional.
   Width Specification
       The following declarations can follow the character set mapping
       definitions (after the "
END CHARMAP" statement). Each consists of the
       keyword shown in the following list, starting in column 1, followed
       by the value(s) to be associated to the keyword, as defined below.       
WIDTH                        A non-negative integer value defining the column
                        width for the printable character in the coded
                        character set mapping definitions. Coded character
                        set character values are defined using symbolic
                        character names followed by column width values.
                        Defining a character with more than one 
WIDTH                        produces undefined results. The 
END WIDTH keyword is
                        used to terminate the 
WIDTH definitions. Specifying
                        the width of a non-printable character in a 
WIDTH                        declaration produces undefined results.       
WIDTH_DEFAULT                        A non-negative integer value defining the default
                        column width for any printable character not listed
                        by one of the 
WIDTH keywords. If no 
WIDTH_DEFAULT                        keyword is included in the charmap, the default
                        character width is 
1.
       Example:
       After the "
END CHARMAP" statement, a syntax for a width definition
       would be:
         WIDTH
         <A>             1
         <B>             1
         <C>...<Z>       1
         ...
         <fool>...<foon> 2
         ...
         END WIDTH
       In this example, the numerical code point values represented by the
       symbols 
<A> and 
<B> are assigned a width of 
1. The code point values       
< C> to 
<Z> inclusive, that is, 
<C>, 
<D>, 
<E>, and so on, are also
       assigned a width of 
1. Using 
<A>...<Z> would have required fewer
       lines, but the alternative was shown to demonstrate flexibility. The
       keyword 
WIDTH_DEFAULT could have been added as appropriate.
SEE ALSO
       locale(1), 
localedef(1), 
nl_langinfo(3C), 
extensions(7), 
locale(7)                              December 1, 2003                    CHARMAP(7)