Tuesday, November 22, 2011

Pondering about the Foundations

As I am thinking about how a possible Ωmega 2.0 could be implemented, I am trying to come up with some interesting concepts w.r.t. parsing and type checking. Since no single line is written yet, it is easy and cheap to ponder.

Some key elements are already taking shape in my mind. These are:
  • parsed entities carry their start and end coordinates in their types
  • there is no difference between patterns and expressions
  • each expression carries a typing thrist.
The second item has been elaborated by me in an earlier post. The third must wait a bit until I have something coherent (but the curious reader might visit my sketch of ideas).

I'd like to talk about item one here. The essence is that every grammar element obtains a type that describes where this element is located. For example a function declaration may occupy the text portion between start and end, where these are type-level triples (file, line, column). In order to consider parsed input consistent, the start coordinates must be aligned with the prior element's end coordinates, i.e. we have another thrist here.

Things get interesting when we model naming and scope. Name definition and referral must somehow match the types, after the match is done the semantics can be read off the user. To give an example:
Ref definer :: Exp (`foo, 42, 7) (`foo, 42, 11)
  where definer :: Exp (`foo, 41, 1) (`foo, 45, 37) = ...
Both the reference and the definition live in the same file "foo.prg". The former is on line 42 from column 7 to 11 (exclusive). The definition it refers to is in fact enclosing it (a recursive call, maybe?) extending from line 42 to 45. With this representation the name of the indentifier becomes secondary, all that counts is the evidence in definer which contributes to its type! We have created a globally nameless representation.

We can expand this world view to also include the level of the grammatical entity (value, type, kind, etc.) as a fourth coordinate. So a reference to a ›25‹ somewhere in the text at floor 0 would point to an integer literal, the same place at floor 1 to the type Int and at floor 2 it would be the kind *.

My hope is that all the proof obligations arising in such a scheme can be cleverly extracted from a parsec-like engine.