| Unicode Support for Mathematics : Unicode Support for Mathematics Murray Sargent III
Microsoft |
| Overview : Overview Unicode math characters
Semantics of math characters
Unicode and markup
Multiple ways of encoding math characters
Not yet standardized math characters
Inputting math symbols |
| Unicode Math Characters : Unicode Math Characters 340 math chars exist in ASCII, U+2200 – U+22FF, arrows, combining marks of Unicode 3.0
996 math alphanumeric characters are in Unicode 3.1’s Plane 1
591 new math symbols and operators are in Unicode 3.2’s BMP
One math variant selector
One new combining character (reverse solidus). |
| Basic Set of Alphanumeric Characters : Basic Set of Alphanumeric Characters Latin digits (0 - 9)
Upper- & lowercase Latin letters (a - z, A - Z)
Uppercase Greek letters Α - Ω plus the nabla ∇ and the variant of theta Θ given by U+03F4
Lowercase Greek letters α - ω plus the partial differential sign ∂ and glyph variants of ε, θ, κ, φ, ρ, and π
Only unaccented forms of letters are used |
| Math Alphanumeric Characters : Math Alphanumeric Characters Math needs various Latin and Greek alphabets like normal, bold, italic, script, Fraktur, and open-face
May appear to be font variations, but have distinct semantics
Without these distinctions, you get gibberish, violating Unicode rule: plain text must contain enough info to permit the text to be rendered legibly, and nothing more
Plain-text searches should distinguish between alphabets, e.g., search for script H shouldn’t match H, etc.
Reduces markup verbosity |
| Legibility Loss : Legibility Loss Without math alphabets, the Hamiltonian formula
H = dτ [εE2 + μH2]
becomes an integral equation
H = dτ [εE2 + μH2] |
| Math Alphanumeric Chars (cont) : Math Alphanumeric Chars (cont) Plain a-z, A-Z, 0-9, -, -Ω
Bold a-z, A-Z, 0-9, -, -Ω
Italic a-z, A-Z, -, -Ω
Bold italic a-z, A-Z, -, -Ω
Script a-z, A-Z
Bold script a-z, A-Z
Fraktur a-z, A-Z
Bold Fraktur a-z, A-Z
Double struck a-z, A-Z, 0-9
Sans-serif a-z, A-Z, 0-9
Sans-serif bold a-z, A-Z, 0-9, -, -Ω
Sans-serif italic a-z, A-Z
Sans-serif bold italic a-z, A-Z, -, -Ω
Monospace a-z, A-Z, 0-9 |
| How Display Math Alphabets? : How Display Math Alphabets? Can use Unicode surrogate pair mechanisms available on OS
Alternatively, bind to standard fonts and use corresponding BMP characters
Second approach probably faster and to display Unicode one needs font binding in any event. But most traditional fonts are not suited to math alphabetic characters
A single math font may look more consistent |
| Math Alphabetics via Glyph Variants : Math Alphabetics via Glyph Variants One approach to the math alphanumerics would be to use a set of math glyph variant selectors
Such a tag would follow a base character imparting a math style
Approach was dropped since it seemed likely to be abused
One math variant selector does exist to offer a different line slant for some composite symbols
Other variant selectors are being defined for nonmath purposes, e.g., Han variants |
| Multiple Character Encodings : Multiple Character Encodings As with nonmath characters, math symbols can often be encoded in multiple ways, composed and decomposed
E.g., ≠ can be U+003D, U+0338 or U+2260
Recommendation: use the fully composed symbol, e.g., U+2260 for ≠
For alphabetic characters, use combining-mark sequences to get consistent typography
Some representations use markup for the alphabetic cases. This allows multicharacter combining marks. |
| Compatibility Holes : Compatibility Holes Compatibility holes (reserved positions) exist in some Unicode sequences to avoid duplicate encodings (ugh!)
E.g., U+2071-U+2073 are holes for ¹²³, which are U+00B9, U+00B2, and U+00B3, respectively
Math alphanumerics have holes corresponding to Letterlike symbols.
Recommendation: you can use the hole codes internally, but must import and export the standard codes. |
| Nonstandard Characters : Nonstandard Characters People will always invent new math characters that aren’t yet standardized.
Use private use area for these with a higher-level marking that these are for math.
This approach can lead to collisions in the math community (unless a standard is maintained)
Cut/copy in plain text can have collisions with other uses of the private use area |
| Unicode and Markup : Unicode and Markup Unicode was never intended to represent all aspects of text
Language attribute: sort order, word breaks
Rich (fancy) text formatting: built-up fractions
Content tags: headings, abstract, author, figure
Glyph variants: Poetica font: 58 ampersands; Mantinia font: novel ligatures (TT, TE, etc.)
MathML adds XML tags for math constructs, but seems awfully wordy |
| Unicode Plain Text : Unicode Plain Text Can do a lot with plain text, e.g., BiDi
Grey zone: use of embedded codes
Unicode ascribes semantics to characters, e.g., paragraph mark, right-to-left mark
Lots of interesting punctuation characters in range U+2000 to U+204F
Extensive character semantics/properties tables, including mathematical, numerical |
| Unicode Character Semantics : Unicode Character Semantics Math characters have math property
Math characters are numeric, variable, or operator, but not a combination
Properties are useful in parsing math plain text
MathML doesn’t use these properties: every quantity is explicitly tagged
Properties still can be useful for inputting text for MathML (noone wants to type all those tags!)
Sometimes default properties need to be overruled
Would be useful to have more math properties |
| Plain Text Encoding : Plain Text Encoding TEX fraction numerator is what follows a { up to keyword \over
Denominator is what follows the \over up to the matching }
{ } are not printed
Simple rules give unambiguous “plain text”, but results don’t look like math
How to make a plain text that looks like math? |
| Simple plain text encoding : Simple plain text encoding Simple operand is a span of alphanumeric characters
E.g., simple numerator or denominator is terminated by any operator
Operators include arithmetic operators, most whitespace characters, all U+22xx, an argument “break” operator (displayed as small raised dot), sub/superscript operators
Fraction operator is given by the Unicode fraction slash operator U+2044 |
| Fractions : Fractions abc/d gives
More complicated operands use parentheses ( ), brackets [ ], or { }
Outermost parens aren’t displayed in built-up form
E.g., plain text (a + c)/d displays as
Easier to read than TEX’s, e.g., {a + c \over d}
MathML: a+ cd
Neat feature: plain text looks like math |
| Subscripts and Superscripts : Subscripts and Superscripts Unicode has numeric subscripts and superscripts along with some operators (U+2070-U+208E)
Others need some kind of markup like …
With special subscript and superscript operators (not yet in Unicode), these scripts can be encoded nestibly
Use parentheses as for fractions to overrule built-in precedence order |
| Presentation markup : Presentation markup E
=
m
⁢
c
2
Presentation markup directs how the math should be rendered. |
| Content markup : Content markup
E
m
c
2
Content markup describes the meaning of the expression, not the format. |
| Unicode TEX Example : Unicode TEX Example |
| Symbol Entry : Symbol Entry GUI PCs can display a myriad glyphs, mathematics symbols, and international characters
Hard to input special symbols. Menu methods are slow. Hot keys are great but hard to learn
Reexamine and improve symbol-input and storage methods
With left/right Ctrl/Alt keys, PC keyboard gives direct access to 600 symbols. Maximum possible = 2100 = 1030
Use on-screen, customizable, keyboards and symbol boxes
Drag & drop any symbol into apps or onto keyboards |
| Hex to Unicode Input Method : Hex to Unicode Input Method Type Unicode character hexadecimal code
Make corrections as need be
Type Alt+x to convert to character
Type Alt+x to convert back to hex (useful especially for “missing glyph” character)
Resolve ambiguities by selection
Input higher-plane chars using 5 or 6-digit code
New MS Word standard |
| Built-Up Formula Heuristics : Built-Up Formula Heuristics Math characters identify themselves and neighbors as math
E.g., fraction (U2044), ASCII operators, U2200–U22FF, and U20D0–U20FF identify neighbors as mathematical
Math characters include various English and Greek alphabets
When heuristics fail, user can select math mode: WYSIWYG instead of visible math on/off codes |
| Operator Precedence : Operator Precedence Everyone knows that multiply takes precedence over add, e.g., 3+5×3 = 18, not 24
C-language precedence is too intricate for most programmers to use extensively
TEX doesn’t use precedence; relies on { } to define operator scope
In general, ( ) can be used to clarify or overrule precedence
Precedence reduces clutter, so some precedence is desirable (else things look like LISP!)
But keep it simple enough to remember easily |
| Layout Operator Precedence : Layout Operator Precedence Subscript, superscript ¯
Integral, sum ò S P
Functions Ö
Times, divide / * × · •
Other operators Space ". , = - + Tab
Right brackets )]}|
Left brackets ([{
End of paragraph FF EOP |
| Mathematics as a Programming Language : Mathematics as a Programming Language Fortran made great steps in getting computers to understand mathematics
Java and C# accept Unicode variable names
C++ has preprocessor and operator overloading, but needs extensions to be really powerful
Use Unicode characters including math alphanumerics
Use plain-text encoding of mathematical expressions
Can’t use all mathematical expressions as code, but can go much further than current languages go
When to to multiply? In abstract, multiplication is infinitely fast and precise, but not on a computer |
| Slide30 : void IHBMWM(void)
{
gammap = gamma*sqrt(1 + I2);
upsilon = cmplx(gamma+gamma1, Delta);
alphainc = alpha0*(1-(gamma*gamma*I2/gammap)/(gammap + upsilon));
if (!gamma1 && fabs(Delta*T1) < 0.01)
alphacoh = -half*alpha0*I2*pow(gamma/gammap, 3);
else
{
Gamma = 1/T1 + gamma1;
I2sF = (I2/T1)/cmplx(Gamma, Delta);
betap2 = upsilon*(upsilon + gamma*I2sF);
beta = sqrt(betap2);
alphacoh = 0.5*gamma*alpha0*(I2sF*(gamma + upsilon)
/(gammap*gammap - betap2))
*((1+gamma/beta)*(beta - upsilon)/(beta + upsilon)
- (1+gamma/gammap)*(gammap - upsilon)/
(gammap + upsilon));
}
alpha1 = alphainc + alphacoh;
}
|
| Conclusions : Conclusions Unicode provides great support for math in both marked up and plain text
Unicode character properties facilitate plain-text encoding of mathematics but aren’t used in MathML
Heuristics allow plain text to be built up
Need two more Unicode assignments: subscript and superscript operators
On-screen keyboards and symbol boxes aid formula entry
Unicode math characters could be useful for programming languages |