Re: [isabelle] Unicode (Was: Update on I3P)
On Fri, 10 Sep 2010, Joachim Breitner wrote:
I should add that this sequence of four characters is the utf8
representation of 𝒵 (\<Z>), re-encoded as utf8:
$ echo -n ð??µ|iconv -futf8 -tlatin1
$ echo -n 𝒵 | iconv -flatin1 -tutf8
Maybe i3p has problems with unicode character code points that do not
fit in one 16-bit number?
This is a deeper problem of the way Unicode works on the JVM, see also
Few Java applications get this specification of "surrogate code points"
right, because this situation occurs rarely. UTF-8 works generally better
because the multi-character sequences are the rule, not the exception.
It is unlikely that this will get fundamentally better on the JVM,
although some people have already started joking that Oracle will soon
identify "" and null for type String ...
The easyiest fix is to remove these non-16bit codes from the table of
Isabelle symbol interpretation.
This archive was generated by a fusion of
Pipermail (Mailman edition) and