# Re: [isabelle] Non-ASCII characters and unsupported text structures

• To: Wolfgang Jeltsch <wolfgang-it at jeltsch.info>, cl-isabelle-users at lists.cam.ac.uk
• Subject: Re: [isabelle] Non-ASCII characters and unsupported text structures
• From: Makarius <makarius at sketis.net>
• Date: Mon, 15 Jul 2019 15:34:35 +0200
• Authentication-results: mx2f26; spf=pass (sender IP is 62.216.204.247) smtp.mailfrom=makarius at sketis.net smtp.helo=[192.168.178.32]
• Autocrypt: addr=makarius at sketis.net; prefer-encrypt=mutual; keydata= mQINBFcrF+4BEADMcXMnu3XHg6bRsGe3tajAHqvm89+ecn/Y0WhjI2FplhkZs1LwM+ZA9eXh hiBrC/yX0FJ+qjzVIfm66CX4nzVG1f8qwaervMpvpA+gbhtQiXc0t+LDcqV+5cdtpKplPHSu oW+KzJKyCdkDB5fYMOzuaXQwYi12YAEQH2r6K7Q7Np+k82Xli1pWe+Tha/BH0pKJ5Q01aPep ASrNW9F+moX7C0fxWl65LiXGmF0UJep6fqKruhy8oNF4p6I2oZhktvaR/x6tkL2PkT3r+xUS 6g5i3BOjfwhoGY57nsioeK+8VFvdRH5DK4CbrTgDl7ddcrEeENrfpDiPLs/afVbe/T9oDXmJ OJAO4WMpfZNiFhx9SSVTHohw29Fyzn0N1UQGjPqAY1jg32DAxlnMQ0co/KabEFAcoQsW1/6U ZGiNxYVIyEKGrnSY4WuLuNC8CmU1RaYSdTk1y9tYdxufM9lH9ynzJwac6FdalOOxMR2G9JG8 L9/dk3ytlP6DVwkPBSCpJaTkTyMp1wSkF1oK/BDu5xKUQh0zvvLCuZ16hiKRBBSjpVExXRZC u+NC1Y4wqm1HOH7HBwgZ1Hv9S/EPmI9iwgcW0SpDJqPf2Cm7oFMZsZ5Dbs6/nOQoe4Zegy45 ymqDRlIekP7zj+vOoR80XAYfmAH5DElJHldcjmgLBMdpvvqGZwARAQABtB5NYWthcml1cyA8 bWFrYXJpdXNAc2tldGlzLm5ldD6JAj0EEwEIACcFAlcrF+4CGyMFCQlmAYAFCwkIBwIGFQgJ CgsCBBYCAwECHgECF4AACgkQ89KUEoG/TbiVrw/9FjEBgh2CB7Qof7Y4k0yc7j+x/A0Wmkc0 iwP5jaKJuxRv1JJT8CFqm392+/cdh3EkRUk/UWD+hpNndYJwxZltrEpKVqFAWoVOg3ZJ4cuI MYhlp4tk/T0KSl/gKT16dc6uJ7M/FzW0zv50vjFtAdianEDuqLXHKaGDUwWoOTDly0gdZ7aH /eNby6ONHUSJMdTNOErh2N+uESM4aZqUuuL/dTb6xiVzCpV5saT8EMakoazUd7QhoBaHvqfs BL7DEmvcTtA79GF3ufHrF/UndIcx8aMznZ2PGNjmy5seDCoKX0EYHdLam8vgo/TuU4dRw5Zn 6E/ouyNOliXT1Mn+SomeBSXTR5MXRq4TQ9MKVGiP0Yl+7GJQU0JFtDC1ZZEOyjIiwGWOhbUR pYujVm8C1iQ7NcEn2BqOAmRML6IR6+En4RLbgCNsBNXlmTPRJOaI+iV6DZzg3x9zcaDGhoYt jkBTEFpb0F3jU9yuaEU5401NV34fUxg8tqXs0R9CKinO7kQ8N+RDjyyY8k2KZiDYBJ6r+OK0 d7TaTj7F9tmpAu2pmQ8lxOKjDZIwlbGTsC4lxISmcPzBGTKXja5nakcWYx/lZ4vje0WZ12HN amnD1weakFixRYit0d+Kz7cuj684NSbhwC0oN2t9R06Nfq8UPEWRKEitCly0OtRgio8zVZ/L eAC5Ag0EVysX7gEQAKs5NVOvYkE/r8KLsJ8/L/9eulpJDOFilZ9fyuqii7t1UpHZLb58QghW JM+IB2GSGsB+pOi6Je7hmwxcVdXYbGlYZ4Btqqw48/XptfbNZ8alCk6AqoVFP4MbYxij/Qqv f/Yw6GR0p1RIC/W4GF/JgDDwDFEiMT6Pv6dpM8acdNFCERDZdoOJiC+XJRwy5lZ2g5FOJkT9 rVI9EnA7mBXLLjPOMUp2/eZxN6gKOZzI3ej9vixg3adWR2yfKPgacHP/ujnVfITOl0OyLE5f zIHq85dEV4zW6Mpx7+Um0tdkwlUVMaW2nQ1bcwejgVAuD/MLSF/lSs3N5D1ctw5QUemYh7/e 2dC12UJuFDFxNPzcltQTlkBCVWV1D0SjScDHdlF6HhzpZOlt52/rwTn5GHtY4nwAL4IJ+Yvl WX8YKmyILH4Ai8c/N2IVRERQ2qorWFlsQnqrXV+hXf8aUwjc/pq4K9rsWxvle3TpeZfoBefU /s1PEX0SepZFAqAXHlQ9DZPsdPDo9EFK695G5w4nf03EhE9TV1MKGUuc1XJ6f1ZLaxu0TwTA 6qYtKIyBcU0Yn61S2Kh7Dgb5LdLV8nfl71+n/xIt2IWH5UJ9YuwEgGEP0c6ImnAUZ+nRodFI 0RwtCWlRkSJWtQln1vcphrz8PjWZH6e/nWnceXR/Al5P0WexQgtvABEBAAGJAiUEGAEIAA8F AlcrF+4CGwwFCQlmAYAACgkQ89KUEoG/Tbh7VQ/9Fc4bdwJYc3jH/LiuXv6uMg50Cv6lg2NT bL9DClWGNiYzejfM2A4c5K+GRUXhyD7S9U203MOv3z7uTbtyQL8XVolNnQlRIkB00f8nJ2sw HMXx/hemjXBvtlneq+vrMORJexldXUMFq19ZZrvj0zZL+pUnGFqt+IWTEE5GpL7wu20Demaj jYyGyKcDZyJOWZcl4e45Yn3hl0EI2xVmVh7ZinVsb3+nqgcltFy4Jt+drezwV2EiLGJHfGsT jEQb3C9VpneU4Jo+hHtfqLK4Q8+WlIOzSfyvwbabxrhyqg7i11fu8yckNW3dCURPYigV07HK 4dN0zhj53M0Q3eTwegJRPJb8XoLDcSdbsaU2HIShlGDKmzS9KL4JzLikQ9dXROC4cae3jRKH aexFi4B55Ab6FxIfXj09wUCO6Nm0owDfIBDMgiywvi2Rb2etCjBgRbSj71S2nntd9ZitoYvE yKirLkWmJRbp3ln8cHi8Uc/jr1cDPVRWuLUN0uceMj5+AR+NZVakcNUHWJCinMMjacho0SyP QmocdU8pzzupreaVWruqaSzqcpWBPwrE5OxEtJ+OyIBjKmRJ5eptjh4rKgNaVnKjhqbvr+Yz pUAgPp38jjf4HJghUGIfWArKNelKJEJOYk94DAbmT67LgqEdZ0yaA2BCHmreN727WIzV9vkX NMc=
• Openpgp: preference=signencrypt
• References: <00e199d103b5c14ceaf9ec569cf56c920b924f36.camel@jeltsch.info> <C8B65A92-B0A8-497E-A30C-70EE992AC7D7@gmail.com> <4fc720f055e0dd75d9f04c1e42ffe89c79e8ca70.camel@jeltsch.info>
• User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2

On 04/07/2019 19:07, Wolfgang Jeltsch wrote:
>
> Hmm, when I enter a non-ASCII character into an Isabelle/jEdit buffer, I
> cannot be sure that it will make it into the file in UTF-8-encoded form
> when saving. I think it works in practice for symbols like “ and ”.
> However, if I enter the Greek letter γ, for example, it will be saved to
> the file as \<gamma>. If it appears in documentation text, outside of
> actual Isabelle code, this \<gamma> will end up verbatim in the
> generated LaTeX source code, causing an error.
>
> So apparently it would work to place a symbol like “ or ” into the
> documentation part of an Isabelle file and having it transferred to the
> generated LaTeX file and finally processed correctly by LaTeX using some
> sort of LaTeX UTF-8 support. However, this would work only because
> symbols like “ and ” don’t have an Isabelle encoding in the form \<…>.
> As soon as they’d get one, we would run into the same problem as with
> the γ above.

(I am rather late on this thread, and most things have been answered
already. Here are just a few side-remarks.)

A quick hypersearch (not grep) over \$ISABELLE_HOME/src/Doc for
"utf|unicode" as insensitive regexp reveals the following explanation in
the Isabelle/jEdit manual:

The appendix of @{cite "isabelle-isar-ref"} gives an overview of the
standard interpretation of finitely many symbols from the infinite
collection. Uninterpreted symbols are displayed literally, e.g.\
▩‹\<foobar>›''. Overlap of Unicode characters used in symbol
interpretation with informal ones (which might appear e.g.\ in comments)
needs to be avoided. Raw Unicode characters within prover source files
should be restricted to informal parts, e.g.\ to write text in non-latin

The correct conclusion has already been drawn on this thread: do not
lean out of the window too far. Non-latin characters for Eurepean
languages (excluding Greek) are fine. Odd quotation or punctuation marks
are better avoided: they are somehow fragile and non-portable, even
without Isabelle symbol interpretation getting in the way. (Recall that
"Unicode" is not as universal and stable as its name might suggest.)

>> There is a cite antiquotation. Try completing on print_ in
>> Isabelle/jEdit. One of those will give you a list of antiquotations.
>
> Amazing! I had looked for antiquotations in the Isabelle/Isar Reference
> Manual but somehow overlooked the part that talks about \<^cite>.
> Strangely enough, \<^cite> doesn’t appear in the auto-completion list
> (in Isabelle2018 at least).

I see "cite" in the index of isar-ref, and documentation on \<^latex> in
section "3.3.5 Document comments" -- both in current Isabelle2019 and
old Isabelle2018.

>>> Regarding LaTeX source code, I discovered that it is possible to
>>> embed it directly into the contents of text, section, and such.
>>> Is this intentional or is it just that it happens to work with the
>>> current Isabelle implementation?
>>
>> AFAIK this is what everyone has always done and will be doing for a
>> markup (? - I might be misusing that term) — \<^item> being one thing
>> that sticks in my mind, and the absence so far of a \label{..}
>> replacement.
>
> I really like this trend of having more and more markup for
> documentation text.

Yes, the document source is becoming more an more formal. (Everything is
somehow "source text" for different languages of Isabelle. There is no
"code", unless you want to call the input program text for LaTeX like that.)

At some point there will be also an HTML backend to the document
language, not just LaTeX. This means that hardcore LaTeX hacking might
no longer work uniformly, unless restricted to small snippets in
\<^latex>‹…› (which could be rendered via KaTeX in HTML/CSS/JS). This is
still a bit speculative, though: we are not there yet.

> I found \<latex> only by accident in the completion pop-up window when
> looking for something else. As I mentioned above, \<^cite> doesn’t
> appear in the completion list, and the list of antiquotations in the
> Isabelle/Isar Reference Manual is incomplete.

Generally not that \<^NAME> (control symbol) with a cartouche often
works as replacement for old @{NAME ...}, but only if there is a single
argument that is a cartouche.

The control symbol notation has turned out so convenient and popular,
that I am tempted to discontinue the old form at some point in the
future. This requires to change the syntax of various older
antiquotations, notably @{cite ...}, @{thm ...}, @{lemma ...}. Almost
everything else is already in proper shape: @{typ ...}, @{term ...},
@{const_name ...}.

Makarius



This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.