[isabelle] Unicode strings in HOL
- To: isabelle-users at cl.cam.ac.uk
- Subject: [isabelle] Unicode strings in HOL
- From: Lars Hupel <hupel at in.tum.de>
- Date: Sun, 28 Jun 2015 16:31:49 +0200
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1
while playing some more with cartouches I was wondering why we don't
support Unicode strings in HOL.
For example, the following can't be written:
... because it fails with the message:
Bad character: "Ã"
In fact, a very "character" in HOL is defined to be in the range of 0
and 255, so only old-fashioned charsets like ISO-8859-* can be represented.
In "Unicode_Examples.thy" , you will find input and output syntax to
support this. I've implemented appropriate parse and print translations,
such that at least literals work. However, the encoding I use is rather
cumbersome . Conversions to and from the existing string type, code
setup and actual operations are missing.
Anyhow, this is on my "low-priority nice-to-have" list. Does anybody
actually need that? I whipped this up quickly during a commute, but if
somebody is seriously interested in communicating with the outside world
in a formal setting, this might be worth investigating.
 Many programming languages will store Unicode strings as a sequence
of codepoints (e.g. Haskell). A codepoint is a number designating a
specific Unicode character and tells nothing about the specific encoding
used. Now, Isabelle sources are encoded in UTF-8 and any proper
parse/print routine has to perform a conversion between UTF-8 surface
syntax and internal codepoint representation. I didn't do that. Instead,
I'm just storing chunks of UTF-8 bytes. The reason is that I couldn't
find appropriate encoding/decoding functions in Isabelle/ML.
This archive was generated by a fusion of
Pipermail (Mailman edition) and