Re: [isabelle] Non-ASCII characters and unsupported text structures



On 04/07/2019 19:07, Wolfgang Jeltsch wrote:
>  
> Hmm, when I enter a non-ASCII character into an Isabelle/jEdit buffer, I
> cannot be sure that it will make it into the file in UTF-8-encoded form
> when saving. I think it works in practice for symbols like “ and ”.
> However, if I enter the Greek letter γ, for example, it will be saved to
> the file as `\<gamma>`. If it appears in documentation text, outside of
> actual Isabelle code, this `\<gamma>` will end up verbatim in the
> generated LaTeX source code, causing an error.
> 
> So apparently it would work to place a symbol like “ or ” into the
> documentation part of an Isabelle file and having it transferred to the
> generated LaTeX file and finally processed correctly by LaTeX using some
> sort of LaTeX UTF-8 support. However, this would work only because
> symbols like “ and ” don’t have an Isabelle encoding in the form `\<…>`.
> As soon as they’d get one, we would run into the same problem as with
> the γ above.

(I am rather late on this thread, and most things have been answered
already. Here are just a few side-remarks.)

A quick hypersearch (not grep) over $ISABELLE_HOME/src/Doc for
"utf|unicode" as insensitive regexp reveals the following explanation in
the Isabelle/jEdit manual:

  The appendix of @{cite "isabelle-isar-ref"} gives an overview of the
  standard interpretation of finitely many symbols from the infinite
  collection. Uninterpreted symbols are displayed literally, e.g.\
  ``▩‹\<foobar>›''. Overlap of Unicode characters used in symbol
  interpretation with informal ones (which might appear e.g.\ in comments)
  needs to be avoided. Raw Unicode characters within prover source files
  should be restricted to informal parts, e.g.\ to write text in non-latin
  alphabets in comments.


The correct conclusion has already been drawn on this thread: do not
lean out of the window too far. Non-latin characters for Eurepean
languages (excluding Greek) are fine. Odd quotation or punctuation marks
are better avoided: they are somehow fragile and non-portable, even
without Isabelle symbol interpretation getting in the way. (Recall that
"Unicode" is not as universal and stable as its name might suggest.)


>> There is a `cite` antiquotation. Try completing on `print_` in
>> Isabelle/jEdit. One of those will give you a list of antiquotations. 
> 
> Amazing! I had looked for antiquotations in the Isabelle/Isar Reference
> Manual but somehow overlooked the part that talks about `\<^cite>`.
> Strangely enough, `\<^cite>` doesn’t appear in the auto-completion list
> (in Isabelle2018 at least).

I see "cite" in the index of isar-ref, and documentation on \<^latex> in
section "3.3.5 Document comments" -- both in current Isabelle2019 and
old Isabelle2018.


>>> Regarding LaTeX source code, I discovered that it is possible to
>>> embed it directly into the contents of `text`, `section`, and such.
>>> Is this intentional or is it just that it happens to work with the
>>> current Isabelle implementation?
>>
>> AFAIK this is what everyone has always done and will be doing for a
>> while yet. Makarius appears to be steadily adding more semi-formal
>> markup (? - I might be misusing that term) — \<^item> being one thing
>> that sticks in my mind, and the absence so far of a \label{..}
>> replacement.
> 
> I really like this trend of having more and more markup for
> documentation text.

Yes, the document source is becoming more an more formal. (Everything is
somehow "source text" for different languages of Isabelle. There is no
"code", unless you want to call the input program text for LaTeX like that.)

At some point there will be also an HTML backend to the document
language, not just LaTeX. This means that hardcore LaTeX hacking might
no longer work uniformly, unless restricted to small snippets in
\<^latex>‹…› (which could be rendered via KaTeX in HTML/CSS/JS). This is
still a bit speculative, though: we are not there yet.


> I found `\<latex>` only by accident in the completion pop-up window when
> looking for something else. As I mentioned above, `\<^cite>` doesn’t
> appear in the completion list, and the list of antiquotations in the
> Isabelle/Isar Reference Manual is incomplete.

Generally not that \<^NAME> (control symbol) with a cartouche often
works as replacement for old @{NAME ...}, but only if there is a single
argument that is a cartouche.

The control symbol notation has turned out so convenient and popular,
that I am tempted to discontinue the old form at some point in the
future. This requires to change the syntax of various older
antiquotations, notably @{cite ...}, @{thm ...}, @{lemma ...}. Almost
everything else is already in proper shape: @{typ ...}, @{term ...},
@{const_name ...}.


	Makarius




This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.