Re: [isabelle] Embedded languages in Isabelle

Thank you very much for your pointers!
I already looked at Pure/Syntax/simple_syntax.ML, but Pure/Tools/rail.ML 
indeed looks quite helpful, especially with respect to the Prover IDE markup 
(I already tried to implement that in my preliminary approaches, but the 
example will prove very helpful, even though I might move to a different 
approach altogether for my current application at least for now, see below).

To say a bit more about my application: I'm constructing a shallow semantic 
embedding of a philosophical logic (Edward Zalta's "Theory of Abstract 
Objects"). That logic is quite different from HOL in multiple ways, which makes 
this an interesting research project. For example the language contains two 
kinds of atomic formulas, i.e. Fxy "x and y 'exemplify' F" and xyF "x and y 
'encode' F", neither of which can be implemented as mere function application 
in an embedding in HOL. The variable names have type implications, e.g. x and 
y have a different type than F and G, which again have a different type from 
e.g. \kappa, etc. - some of these types map directly to types in HOL, others I 
need to implement with guarding behind additional constants, etc.
My basic implementation maps all these concepts to constructs in the Isabelle 
term language in a working, but not in a user-friendly way - e.g. I need 
special syntax for both exemplification and encoding and at times explicit type 
annotations (resp. "type guard constants"), etc.
In other words I have a working implementation of the logic, but not of its 
syntax (i.e. I can map formulas of the logic to corresponding "not-so-nice-
looking" formulas in Isabelle's term language, but this mapping is "manual 

So what I'm currently trying to research is how close I can get to the actual 
syntax of the target language directly in Isabelle, so that a person who is 
used to the target logic has an easier time working with the embedding.

The target logic itself also defines new symbols in terms of its basic symbols, 
so the implementation should be extensible very much like with mixfix 
annotations to constants or "syntax" commands. That's why I'm trying to avoid 
duplicating an implementation of a priority grammar. (The definitions in the 
target logic themselves are not quite the same as Isar definitions, so I will 
probably introduce new outer syntax commands for them, but in the end they 
will translate to "plain", but more complex definitions satisfying the narrowly 
defined inferential role of definitions in the hyper-intensional free logic of 
the target system).

At the moment I'm trying to only *parse* the actual syntax of the language, 
not to print it, i.e. a formula given in the syntax of the target logic, will 
be translated to its representation in Isabelle's term logic and further 
reasoning will be done in Isabelle's term logic directly (I think this should 
alleviate the problem of transformed names like x_ and xa, but at a later 
stage it might also be interesting to go further and print in the target 
logic's syntax and continue reasoning in it as well, for which this indeed 
will be an issue).

Given that I basically need to support something like mixfix notation for 
extending the syntax and reproducing a suitable variant of the Isabelle term 
language manually from strings or cartouches in the end seems like an 
unnecessary duplication of work (especially since Pure/Syntax/syntax.ML, Pure/
Syntax/syntax_phases.ML, etc., are not exactly meant to be used like this), 
I'm now trying to mainly stay within the Isabelle term language itself. 
Interestingly, only now after looking through the syntax parsing ML code I 
realize that I can do much more than I thought within its boundaries.

Simplified that approach will look like

consts valid :: "p => bool"
consts emb_implies :: "p => p => p"
nonterminal q
syntax valid :: "q => bool" ("[\Turnstile _]")
             emb_implies "q => q => q" (infixl "->" 8)
             emb_not "q => q" ("\<not>_" [40] 70)
             "_atomic_formula" :: "id_position => q" ("_")

and then installing a parse_ast_translation for "_atomic_formula" which splits 
up e.g. "Fxy" to "(_exe F x y)" and "xyF" to "(_enc x y F)", which is then 
translated to the corresponding term in a parse_translation.

So far this is working out quite well and is alleviating the need to deal with 
markup myself for now, so I will see how far I will get with this approach.

Regarding your suggestion to introduce something like "\<app>", printed as a 
thin space: I actually tried this at one point and attempted to introduce 
"\<delim>" to be printed as zero-width-space, but (1) this still makes it 
harder to type formulas in the target logic and might actually be quite 
confusing and (2) I had some issues with jedit at least with zero-width spaces 
(I'm not sure about thin spaces anymore) - unfortunately I don't quite 
remember what these issues were right now.

In any case: I think I can work with this for now and thank you very much for 
your help so far and for clarifying the fact that the ML parsing 
infrastructure for Isabelle's term language is probably best not to be re-/
abused for purposes like this.

In general in the long run it would, of course, be nice to have a self-
contained "reentrant" and "public" version of the parser for Isabelle's term 
language in ML, in which every aspect could be customized, including the 
lexicon and tokenizing and maybe even the generation of "fresh" variable 
names, etc. - but I realize that this is way too much to ask especially for my 
very specific and probably rather exotic use case :-).

Best wishes,
Daniel Kirchner

Am Donnerstag, 3. Oktober 2019, 11:40:39 CEST schrieb Makarius:
> On 01/10/2019 15:16, Makarius wrote:
> > On 26/09/2019 15:58, Daniel Kirchner wrote:
> >> I'm currently working on implementing a custom parser for inner syntax
> >> packed in string constants for implementing an embedded object logic
> >> with a syntactic structure that cannot be parsed otherwise (e.g. single
> >> letter identifiers with no whitespace delimiters, etc., requiring a
> >> customized parsing process). The idea is to end up with a syntax like "
> >> ⊨ ''<formula of embedded logic>'' ". This is already working quite well,
> >> basically using "Syntax.tokenize -> Syntax.parse ->
> >> Syntax_Phases.parsetree_to_ast" with some custom adjustments between the
> >> steps, but unfortunately "Syntax_Phases.parsetree_to_ast" seems to be
> >> private to "Syntax_Phases" at the moment.
> > 
> > The standard way to embed sublanguages is via cartouches, e.g. see
> > Isabelle2019/HOL/ex/Cartouche_Examples.thy for some experiments and
> > examples.
> > 
> > This does not solve the problem of imitation the Isabelle term language.
> > At some point there might be more official ways to do that, e.g.
> > something like antiquotations with control symbol and cartouch within
> > terms.
> Here are further examples for embedded languages in Isabelle:
>   Isabelle2019/src/Pure/Tools/rail.ML
>   Isabelle2019/src/Pure/Syntax/simple_syntax.ML
> Both is based on Isabelle parser combinators. There is also proper
> support for formal positions and Prover IDE markup in the rail
> application -- without that an Isabelle language is not complete.
> (Note on browsing sources in Isabelle/Pure: Documentation panel, entry
> src/Pure/ROOT.ML + its explanation at the top of the file.)
> I don't think that your first idea to change the Isabelle term language
> will work out in general: certain assumptions about lexical syntax and
> identifiers are deeply rooted in the system. There are some explanations
> on formal names in section 1.2 of the "implementation" manual (see the
> Documentation panel in Isabelle/jEdit). In particular, the system may
> transform names within a term in certain situation, e.g. "x" could
> become "x_" or "xa" or "?xa" later on.
> You did not say anything about the application context nor the overall
> motivation. There might be more plain and basic approaches, e.g. an
> application notation \<app> where this newly allocated Isabelle symbol
> is rendered as a thin space.
> 	Makarius

Attachment: signature.asc
Description: This is a digitally signed message part.

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.