Intermediate Representation (IR)
Lowering
Section titled “Lowering”The process of transforming high-level source code to its equivalent intermediate representation is called lowering.
Lowering is takes place over multiple translation phases, each of which resulting in IR code at a different stage:
- High-level source code is translated to raw IR.
- Raw IR is translated to legal IR.
- Legal IR is translated to canonical IR.
A program in raw IR may contain incomplete deinitialization and flow-sensitive type errors. A program in legal or canonical IR shall not violate any of the static semantic rules of the Hylo programming language.
Modules
Section titled “Modules”IR code is organized into modules. A module consists of an application programming interface (API), an application binary interface (ABI), a collection of lowered type definitions, a collection of lowered functions, and a collection of resources.
An API is a collection of type, function, subscript, and property signatures which document how to interact with Hylo code present in the module at source or IR level.
An ABI is a collection of type and function signatures which document how to interact with compiled Hylo code present in the module.
A resource is an arbitrary file that is part of a module and not processed as a source file.
A module is represented using a target-independent binary format called the hylomodule format. Module files should use .hylomoduleextension and application/octet-stream media type.
Lowered functions
Section titled “Lowered functions”A lowered function is a set of IR instructions packaged as a single unit of computation. A lowered function has a name that must be unique across all linked modules.
A lowered function has a signature describing the type and passing conventions of its inputs and outputs.
A lowered function may have a body defined as a set of one or more basic blocks, one of which is designated as the function entry. A lowered function that does not define a body is called a stub and denotes the API of a function defined in another module or the ABI of a function linked from a binary.
Lowered function signatures
Section titled “Lowered function signatures”Basic blocks
Section titled “Basic blocks”A basic block is a non-empty sequence of instructions. The first and last instructions of a basic block are called its entry and exit points, respectively. Control flow shall always enter at entry points and exit at exit points. The last instruction of a basic block shall be a terminator instruction.
A basic block may accept arguments.
Given two basic blocks bb1 and bb2, bb1 is successor of bb2 (or, equivalently, bb2 is predecessor of bb1) if the exit point of bb2 may cause control flow to jump to bb1.
A basic block is reachable if it is an entry point or a successor of a reachable block. All basic blocks of a function shall be reachable in legal or canonical IR.
Given two basic blocks bb1 and bb2:
bb1dominatesbb2if control flow must go throughbb1before it can reachbb2or ifbb1isbb2;bb1strictly dominatesbb2if it dominatesbb2and it is notbb2;bb1immediately dominatesbb2if it strictly dominatesbb2and does not strictly dominate any other basic block that strictly dominatesbb2.
Operands
Section titled “Operands”Instructions in Val IR may accept operands embodying values to be operated upon. There exists 5 types of operands:
- Type references: A type reference denotes a lowered type definition.
- Basic block references: A basic block reference denotes a basic block.
- Constants: A constant is a value that is computed at compile time and immutable at runtime.
- Basic block arguments: A basic block argument denotes the value passed to a basic block when control flow entered to its entry point.
- Instruction results: An instruction result denotes the one of the values produced by the evaluation of an instruction.
Each basic block argument and instruction result is assigned to a unique local register in the function. The value of a register
An instruction i is said to be a user of an operand o if o is argument of i. Each occurence of an operand in one of its users is called a use.
Given two uses u1 and u2, u2 is sequenced after u1 (or, equivalently, u1 is sequenced before u2) if and only if:
- the user of
u2is sequenced after the user ofu1, or u1andu2have the same user andu1appears beforeu2.
Abstract machine
Section titled “Abstract machine”The semantics of Hylo IR is defined in terms of an abstract machine. A language implementation shall interpret or compile native programs that follow the same behavior.
The Val abstract machine consists of a collection of threads and a global memory space. A thread consists of a register map that associates local register names to objects or locations in the machine’s global memory, a control stack that contains instructions, and a call stack that contains register maps.
When a basic block is loaded into a thread, its instructions are pushed onto the stack in reverse order (i.e., the last instruction is pushed first).
Program execution
Section titled “Program execution”Program execution starts by adding a thread to the abstract machine, which gets designated as the main thread. The register map and call stack of the main thread are initialized empty. The control stack is initialized by loading the entry block of the main function of the program’s entry module.
Instructions
Section titled “Instructions”An instruction is a basic unit of computation in Val IR that modifies the state of the executing machine. An instruction may accept zero or more arguments and return zero or more values.
Terminator instructions
Section titled “Terminator instructions”An instruction is called a terminator instruction if it causes control flow to jump to another basic block or exit the current function.
The terminator instructions of Val IR are:
branchcond_branchcall_fallibleterminate