Re: [isabelle] Supporting native Unicode input in Isabelle/jEdit



On Thu, 28 Feb 2013, Christoph LANGE wrote:

I wonder why Isabelle/jEdit does not support native Unicode input, but only uses Unicode for display?

Taking this question in isolation, it actually should work both for input and display. The prover sources are stored in a certain text format as Isabelle "symbols", but that is treated as "encoding" by jEdit, so you will edit the text in UTF-16 of the JVM. This is not fundamentally different from having external files in UTF-8, UTF-16, UTF-32 etc. all recoded to the UTF-16 of Java before editing it.

It is up to the JVM, jEdit, or other input methods to produce such unicode glyphs. Copy-and-paste should also work as long as you stay within the Unicode image of the prover sources.


Just yesterday, I used this myself two times:

* In one of the private mails to you, I made some Isar proofs that were copied from Isabelle/jEdit into the terminal where my mail client was running. This worked sometimes but not always: the combination of terminal + alpine had sometimes problems coping with the multibyte UTF-8 version of the text, so in the end I was not 100% sure if you actually received the proof in a way I had in mind.

* Posting a small Isar proof on stackoverflow, via the same copy-paste technique JVM -> Firefox, it worked without problems.


Nowadays, where Unicode is widely supported

There is a long story behind that, some of it is discussed in my MKM 2011 paper, see section 2.1 in http://www4.in.tum.de/~wenzelm/papers/isabelle-doc.pdf

In short, I do not subscribe to the "universality" of Unicode, there are just a bit too many problems with it. The approach in Isabelle/Scala and Isabelle/jEdit is to use it as "poor man's mathematical rendering" in the front-end -- so your funny keyboard should work with it after some convincing. The prover itself is free from Unicode worries, and the ongoing evolution of the Unicode standards in the past decades. Theory sources are stored persistenly in a unicode-free manner, but it tolerates native UTF-8 to some extent.

After the experience of struggling with the many faces of Unicode in the past 3 years, I would go as far to say that an LCF-style prover should *not* allow Unicode at its kernel. There are just too many dangers of getting things utterly wrong (e.g. code-points being swapped or normalized according to strict rules of the standard, which are sometimes implemented in this way or that way.)


	Makarius






This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.