grammar
This page gives the full grammar of ghūl, derived from the compiler's parser.
The grammar is written in W3C EBNF — the notation used by the XML and XPath specifications:
| Notation | Meaning |
|---|---|
A ::= ... | defines the symbol A |
A B | A followed by B |
A | B | A or B |
A? | zero or one A |
A* | zero or more A |
A+ | one or more A |
( ... ) | grouping |
"is" | a literal terminal |
[a-z] | a character in the given set |
[^"] | any character not in the set |
A - B | an A that is not also a B |
CamelCase symbols are grammar productions; Identifier, IntegerLiteral and the other symbols defined under lexical grammar are tokens produced by the tokenizer.
A few constructs are resolved by the parser using context that a context-free grammar cannot express (operator precedence, and a small number of genuinely context-sensitive forms). These are called out in prose where they arise, and the operator precedence table is given at the end.
lexical grammar
The tokenizer turns source text into a stream of tokens. Whitespace — spaces, tabs, carriage returns and newlines — separates tokens but is otherwise insignificant: ghūl is not indentation-sensitive. Whitespace and comments are discarded before parsing.
comments
LineComment ::= "//" [^#xA]*
BlockComment ::= "/*" ( [^*] | "*" [^/] )* "*/"Block comments do not nest: the first */ ends the comment.
identifiers
Identifier ::= PlainIdentifier | EscapedIdentifier
PlainIdentifier ::= Letter ( Letter | Digit )*
EscapedIdentifier ::= "`" ( Letter | Digit | "_" )+
| "`" OperatorChar+
Letter ::= [a-zA-Z_]
Digit ::= [0-9]
QualifiedIdentifier ::= Identifier ( "." Identifier )*A PlainIdentifier may not be one of the reserved words. To use a reserved word — or an operator symbol — as an ordinary identifier, prefix it with a backtick: `field, `+.
reserved words
The following words are keywords and cannot be used as plain identifiers:
assert break case cast catch class const
continue default do elif else enum esac
false fi field finally for if in
innate is isa let mut namespace new
null od private protected ptr public rec
ref return self si static struct super
then throw trait true try typeof union
use when while yrtnumeric literals
IntegerLiteral ::= DecimalInteger | HexInteger
DecimalInteger ::= Digit ( Digit | "_" )* IntegerSuffix?
HexInteger ::= ( "0x" | "0X" ) HexDigit ( HexDigit | "_" )* IntegerSuffix?
HexDigit ::= [0-9a-fA-F]
IntegerSuffix ::= ( "s" | "S" | "u" | "U" )? [bBcCsSiIlLwW]?
FloatLiteral ::= Digit ( Digit | "_" )* "." ( Digit | "_" )* Exponent? FloatSuffix?
Exponent ::= ( "e" | "E" ) "-"? ( Digit | "_" )+
FloatSuffix ::= "s" | "S" | "d" | "D"Underscores within a number are for readability and are ignored. A float literal must contain a .; the type suffix selects single (s/S) or double (d/D), and an integer suffix selects the integer type and signedness.
character and string literals
CharLiteral ::= "'" ( EscapeSequence | [^'] ) "'"
StringLiteral ::= '"' StringElement* '"'
StringElement ::= EscapeSequence | [^"#xA\]
EscapeSequence ::= "\" ( "t" | "n" | "r" | "\" | OctalDigit+ | [^#xA] )
OctalDigit ::= [0-7]A string literal may not span a newline. Two string literals separated only by whitespace are concatenated into a single literal.
Inside a string literal, { begins an interpolation and } ends it; a literal brace is written {{ or }}.
interpolated strings
A string literal containing { ... } is tokenized as a sequence of fragments rather than a single StringLiteral. The parser assembles these as an interpolated string expression:
InterpolatedString ::= EnterString
Interpolation
( ContinueString Interpolation )*
ExitString
Interpolation ::= Expression ( "," Expression )? ( ":" FormatString )?EnterString, ContinueString, ExitString and FormatString are the fragments of literal text surrounding and following each interpolated expression. The optional , introduces an alignment and the optional : a format specifier.
operators
Operator ::= OperatorChar+
OperatorChar ::= [-!$%^&*+=|:@~#\<>.?/] | UnicodeSymbolUnicodeSymbol is any character above U+007E that .NET classifies as a symbol (this admits operators such as ×, ÷, ∩, ∪, ∧, ∨, ≈, ≡).
Operators are tokenized greedily — the longest run of operator characters forms one operator — with one exception: a . immediately after a leading ! or ? ends the operator, so that x!.foo and x?.foo parse as a member access on an unwrap/has-value, not as the operators !. or ?..
A handful of operator spellings are recognised as dedicated tokens rather than general operators: =, :, ., ->, =>, ? and @.
compilation unit
A source file is a sequence of definitions:
CompilationUnit ::= Definition*
Definition ::= Namespace
| Use
| Class
| Trait
| Struct
| Union
| Enum
| Member
| PragmaDefinitionA Member (function, property or indexer) appearing directly in a compilation unit or namespace is a global function, global variable or global indexer.
definitions
namespace and use
Namespace ::= "namespace" QualifiedIdentifier "is" Definition* "si"
Use ::= "use" QualifiedIdentifier ";"
| "use" Identifier "=" QualifiedIdentifier ";"The second form of Use introduces an alias for a namespace or symbol.
class, trait and struct
Class ::= "class" Identifier TypeParameters? Ancestors? Modifiers
"is" Definition* "si"
Trait ::= "trait" Identifier TypeParameters? Ancestors? Modifiers
"is" Definition* "si"
Struct ::= "struct" Identifier TypeParameters? Ancestors? Modifiers
"is" Definition* "si"
TypeParameters ::= "[" TypeParameter ( "," TypeParameter )* "]"
TypeParameter ::= Identifier ( ":" TypeParameterConstraint )?
TypeParameterConstraint ::= "class" | "struct" | "option"
Ancestors ::= ":" TypeListAncestors lists the base class and/or implemented traits.
union
Union ::= "union" Identifier TypeParameters? Modifiers "is" Variant+ "si"
Variant ::= Identifier ( "(" VariableList ")" )? ";"Each Variant optionally carries fields, written as a parenthesised list of name: Type variables.
enum
Enum ::= "enum" Identifier Modifiers "is"
EnumMember ( "," EnumMember )* "si"
EnumMember ::= Identifier ( "=" Expression )?members: functions, properties and indexers
A Member is a function, a property or an indexer. They share a leading name and modifiers; the parser distinguishes them by what follows the name.
Member ::= Function | Property | Indexerfunction
Function ::= FunctionName TypeParameters?
"(" VariableList? ")" ReturnType? Modifiers ( Body | ";" )
FunctionName ::= Identifier | Operator
ReturnType ::= "->" TypeExpression
Body ::= "is" StatementList "si"
| "=>" Expression
| "innate" QualifiedIdentifierA function may be named by an Operator, which defines that operator. A function with no body (just ;) is abstract. A => or innate body is terminated by ;; a block body (is … si) is not.
property
Property ::= Identifier ( ":" TypeExpression )? Modifiers
PropertyAccessors? ";"?
PropertyAccessors ::= PropertyGetter ( "," PropertySetter )?
| PropertySetter ( "," PropertyGetter )?
PropertyGetter ::= Body
PropertySetter ::= "=" Identifier BodyA property with no accessors and the field modifier declares a field. A PropertySetter names the value parameter after =. As with functions, a => or innate accessor body is terminated by ; and a block body is not.
indexer
Indexer ::= Identifier? "[" Variable "]" ( ":" TypeExpression )? Modifiers
PropertyAccessors? ";"?modifiers
Modifiers ::= AccessModifier? StorageClass?
AccessModifier ::= "public" | "protected" | "private"
StorageClass ::= "static" | "const" | "field"pragmas
PragmaDefinition ::= Pragma Definition
Pragma ::= "@" QualifiedIdentifier ( "(" ExpressionList? ")" )?A Pragma annotates the definition (or statement) that follows it.
type expressions
TypeExpression ::= PrimaryType TypeSuffix*
PrimaryType ::= QualifiedIdentifier
| QualifiedIdentifier "[" TypeList "]" /* generic type */
| QualifiedIdentifier "[" "]" /* array type */
| Identifier ":" TypeExpression /* named tuple element */
| "(" TypeList ")" /* tuple, or grouping */
| "(" TypeList? ")" "->" TypeExpression /* function type */
TypeSuffix ::= "[]" /* array */
| "ref" /* by-reference */
| "ptr" /* pointer */
| "?" /* nullable */
| "->" TypeExpression /* function type */
| "." Identifier /* member type */
TypeList ::= TypeExpression ( "," TypeExpression )*( T ) is just T in parentheses — parentheses group, e.g. to disambiguate (a -> b) -> c from a -> b -> c. A parenthesised list of two or more types is a tuple type. Empty parentheses are meaningful only as ( ) -> T, a function type taking no arguments. A name: Type element gives a tuple element a name.
variables
Variable ::= VariableLeft ( ":" TypeExpression )? "mut"? ( "=" Expression )?
VariableLeft ::= Identifier
| "(" VariableLeft ( "," VariableLeft )* ")"
VariableList ::= Variable ( "," Variable )*The parenthesised form of VariableLeft destructures a tuple. A bare let local variable is immutable unless followed by mut.
statements
A statement list is a sequence of statements. A ; separates statements; it is required after a statement whose syntax would otherwise run on into the next, and optional elsewhere.
StatementList ::= ( Statement ";"? )*
Statement ::= Let
| Return
| Throw
| Assert
| If
| Case
| Try
| Loop
| For
| Break
| Continue
| PragmaStatement
| Labelled
| Assignment
| ExpressionStatementlocal variable definitions, return, throw, assert
Let ::= "let" "use"? VariableList ( "in" Expression )?
Return ::= "return" Expression?
Throw ::= "throw" Expression?
Assert ::= "assert" Expression ( "else" Expression )?let use defines a local variable holding a disposable, whose dispose is called when the variable goes out of scope. The let … in … form is a let-in expression used as a statement.
if
If ::= "if" IfCondition "then" StatementList
( "elif" IfCondition "then" StatementList )*
( "else" StatementList )?
"fi"
IfCondition ::= Expression
| "let" Variable /* if-let local variable */The if let form defines a local variable whose initializer must be present; a type ascription on it (if let c: T = e) tests that the value is a T.
case
Case ::= "case" Expression
( "when" ExpressionList ":" StatementList )*
( "default" StatementList )?
"esac"try
Try ::= "try" StatementList
( "catch" Variable StatementList )*
( "finally" StatementList )?
"yrt"loops
Loop ::= ( "while" Expression )? "do" StatementList "od"
For ::= "for" Variable "in" Expression "do" StatementList "od"A do … od with no while is an unconditional loop.
break, continue and labels
Break ::= "break" Identifier?
Continue ::= "continue" Identifier?
Labelled ::= Identifier ":" StatementA Labelled statement may be targeted by break or continue with the matching label.
assignment and expression statements
Assignment ::= Expression "=" Expression
ExpressionStatement ::= Expression
PragmaStatement ::= Pragma Statementexpressions
An expression is a sequence of operands joined by binary operators. The parser resolves operator nesting by precedence; the grammar below gives the flat structure.
Expression ::= UnaryExpression ( Operator UnaryExpression )*|| is the yield infix used to produce a value from a generator step; it has the lowest precedence and does not chain.
unary expressions
UnaryExpression ::= Operator UnaryExpression /* prefix operator */
| PostfixExpressionpostfix expressions
PostfixExpression ::= PrimaryExpression PostfixSuffix*
PostfixSuffix ::= "(" ExpressionList? ")" /* call */
| "[" ExpressionList "]" /* index expression, or generic application */
| "`[" TypeList "]" /* explicit generic application */
| "." Identifier /* member access */
| "?" /* has-value test */
| "!" /* unwrap */
| "ref" /* by-reference */
| "|" /* pipe */A [ ... ] suffix is either an index expression — an access through an indexer — or a generic type application, depending on whether its contents resolve as expressions or as types; `[ ... ] forces the generic-application reading.
function literals
A primary expression — or a parenthesised argument list — followed by ->, =>, is or rec is a function literal:
FunctionLiteral ::= FunctionArguments ( "->" TypeExpression )? "rec"? Body
FunctionArguments ::= "(" VariableList? ")"
| Identifierrec marks the literal as recursive, so it may refer to itself.
primary expressions
PrimaryExpression ::= Identifier
| Literal
| "(" ExpressionList? ")" /* tuple or grouping */
| "[" ExpressionList "]" ( ":" TypeExpression )? /* list literal */
| "cast" TypeExpression "(" Expression ")"
| "isa" TypeExpression "(" Expression ")"
| "typeof" TypeExpression
| "default" ( "[" TypeExpression "]" )?
| "self"
| "super"
| "rec"
| If /* if-expression */
| "let" "use"? VariableList "in" Expression /* let-in */
Literal ::= IntegerLiteral
| FloatLiteral
| StringLiteral
| CharLiteral
| InterpolatedString
| "true" | "false"
| "null"
ExpressionList ::= Expression ( "," Expression )*A list literal [ a, b, ... ] builds a List; it requires at least one element (use LIST[T]() for an empty list).
Within an ExpressionList that forms call arguments or a tuple, an element of the form Identifier ":" TypeExpression? ( "=" Expression )? is an inline local variable definition rather than a plain identifier — this is the only place that form is accepted.
operator precedence
ghūl has no fixed list of binary operators: any operator token may be used infix. Precedence is assigned by a table of built-in operators plus a first-character heuristic for everything else, so the grammar's flat Expression ::= UnaryExpression ( Operator UnaryExpression )* is disambiguated by the following levels, tightest first:
| Precedence | Operators |
|---|---|
| (prefix unary, member access, call, index — tightest) | |
| user‑8 | (user-defined) |
| multiplication | * × ✕ / % ÷ |
| user‑7 | (user-defined) |
| addition | + - |
| user‑6 | (user-defined) |
| bitwise | & | ¦ ^ ∩ ∪ |
| user‑5 | (user-defined — default) |
| shift | << >> |
| user‑4 | (user-defined) |
| range | .. :: |
| user‑3 | (user-defined) |
| relational | == != =~ !~ < > >= <= ≈ ≡ |
| user‑2 | (user-defined) |
| boolean | /\ \/ ∧ ∨ |
| user‑1 | (user-defined) |
| yield infix | || |
All binary operators are left-associative. Prefix unary operators, member access, calls and indexing bind more tightly than any binary operator.
A user-defined operator — any operator not in the table above — is assigned a precedence from its first character, modelled on OCaml and F#: operators starting with * / % bind as multiplication, + - as addition, and so on; an operator with no recognised first character defaults to user‑5. The @precedence("op", "level") pragma overrides the precedence of a named operator. Both arguments must be string literals — a numeric level is not accepted — and level names a precedence level: user-1 … user-8, or one of the built-in level names boolean, relational, range, shift, bitwise, addition and multiplication.