# Re: [isabelle] Document processing with XeLaTeX and unicode symbols

```On Sat, 13 Oct 2012, Lars Hupel wrote:

```
```text{* In the proof of the main theorem from the lecture notes, the
concept of a »fixed copy« of a graph is fundamental. *}

On disk, this is translated to:

```
text{* In the proof of the main theorem from the lecture notes, the concept of a \<guillemotright>fixed copy\<guillemotleft> of a graph is fundamental. *}
```
This is how the the encoding UTF-8-Isabelle of Isabelle/jEdit works.
```
UTF-16 material in the text buffer is turned into plain Isabelle symbols according to the cumulative etc/symbols specifications found in the Isabelle installation.
```
```
The basic idea is to use "Unicode" as poor man's rendering for mathematical text, where non-ASCII symbols mainly occur in the formal part, or in the informal part in a way that is not in conflict with the formal symbol interpretation. Since guillemots have been found useful for formal notation at some point, they are part of this default table of interpreted Isabelle symbols. Thus it becomes difficult to use them in informal text, as you have observed.
```

```
```the TeX sources still contain the token '\<guillemotright>' and xelatex
complains about an undefined control sequence. (*)

Interestingly enough, such a translation doesn't happen for the symbols
'›' and '‹'.
```
```
```
This is because the single guillemots are not interpreted by default etc/symbols.
```

```
```Is there any way I could keep using '»' and '«' as quotation marks?
```
```
```
You could use some perl script over the generated .tex sources to replace remaining \<guillemotright> by some other text; the formal ones are already expanded to {\isasymguillemotright} at this point.
```
```
This works by having some document/IsaMakefile to do the main processing, and injecting the perl filter before the standard "isabelle latex" phases of your nested IsaMakefile (see also "isabelle document" in the Isabelle System manual).
```

```
(*) side note: xelatex is able to process Unicode tokens, so it would be fine if the .thy or .tex file simply contained '»' and '«'.
```
```
This sounds like you are still optimistic about Unicode in general. For me a sentence with "able to process" and "Unicode" is an Oxymoron, after several years of struggling with this constantly changing standard. There are so many things that can go wrong with Unicode.
```
```
In France, there are jokes like http://www.apprendre-en-ligne.net/bloginfo/index.php/2009/01/21/151-martine-ecrit-en-utf-8 which stem from everyday life. Just last week I got a printed bon at the supermarket with a simular unicode accident. So Unicode is something that does some job sometimes, but not to be counted on by default if there is another way.
```

Makarius```

• Follow-Ups:

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.