EraVM Binary Layout
Definitions
- A directive is a command issued to the assembler, which is not translated into
an executable bytecode instruction.
Their names start with a period, for example,
.cell
. Directives are used to regulate the translation process. - An instruction constitutes the smallest executable segment of bytecode. In EraVM, each instruction is exactly eight bytes long.
- A word is a 256-bit unsigned integer in a big-endian format.
Structure of assembly file
This section describes the structure of an EraVM assembly file, a text file
typically with the extension .zasm
.
Data types
U256
– word, a 256-bit unsigned integer number, big-endian.U16
– 16-bit unsigned integer number, big-endian.
Sections
The source code within an EraVM assembly is organized into distinct sections. The start of a section is denoted by one of the following directives:
.rodata
– constant, read-only data..data
– global mutable data..text
– executable code.
Additional sections may be implemented in the future.
The description of any section may be spread across the file:
.rodata
.cell 0
.text
<some instruction>
.rodata
.cell 1
In this example, multiple .rodata
sections appear, but in the resulting binary
file they will be merged into a single contiguous region of memory.
Same principle applies to other sections.
Defining data
The .cell
directive defines data:
.rodata
.cell -1
.cell 23090
.data
.cell 1213
- Note: using
.cell
in the.data
section is deprecated and will not be supported in the future versions of assembly. - The value of cell is provided as a signed 256-bit decimal number.
- Negative numbers will be encoded as 256-bit 2’s complement, e.g.
-1
is encoded as0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
. - An optional
+
sign before positive numbers is allowed, e.g..cell +123
. - Hexadecimal integer literals are not supported.
- Symbols (names of labels) are supported, for example:
.text
f:
add r0, r0, r0
g:
add r0, r0, r0
.rodata
my_cells:
.cell @lab1
.cell @lab2
.cell -1
Note the @
prefixing the label name.
Each .cell
is 256-bit wide, even though an address such as @lab1
or @lab2
is just 16-bit wide.
Addresses are padded with zeroes to fit in the word.
Overall structure
The structure of an assembly file is described as follows:
<file> := <section>*
<section> :=
| ".rodata" <eol> <const-element>*
| ".data" <eol> <data-element> *
| ".text" <eol> <code-element> *
<const-element> := <label> | <cell>
<label> ::= [a-zA-Z_.@][0-9a-zA-Z_.@]
<data-element> := <label> | <cell>
<cell> :=
".cell" <256-bit signed or unsigned constant>
<comment> ::= ";" .*
<labels> ::= (<label> ":" ) *
<code-element> ::= <labels> <instruction> <operand-list> <comment>? EOL
EOL
stands for “end of line”.<instruction>
,<operand-list>
depend on the specific instruction. See the EraVM specification.
Execution model
This section provides some elements of the execution environment, the Era Virtual Machine. Full execution model is described in EraVM specification.
Registers
EraVM has 16 general-purpose registers and several special registers:
PC
is a 16-bit program counter register; it holds the address of the next instruction to be executed.SP
is a 16-bit stack pointer register. It points to the address following the top of the stack.
Memory
EraVM’s memory, that backs up the execution of a program, is divided into pages. When a contract is launched, EraVM assigns several pages to it:
- Code page.
- Immutable.
- Contains words.
- Used to store both instructions and the constants of type
U256
. - Each word may contain 4 instructions or one constant.
- Instructions and constants are indistinguishable.
- Code page is addressable in two ways:
- When EraVM fetches instruction from this page using
PC
, it addresses 8-byte chunks. - When EraVM fetches constants from this page, it addresses 32-byte
(word-sized) chunks.
For example, reading a constant by the address 0 will yield a word composed of binary encoded instructions number 0, 1, 2 and 3; reading a constant by the address 1 from this page will yield a binary encoding for the instructions number 4,5,6,7, and so on.
- When EraVM fetches instruction from this page using
- Heap page.
- Contains bytes and is byte-addressable.
- However, it is only possible to read words from heap, not the individual bytes.
- Data stack page.
- Contains words.
- Grows towards higher addresses, so every push-like instruction advances
SP
by at least one. - Reserving space on stack is therefore incrementing the value of
SP
. - Each word has an additional tag. If the tag is set, the word contains a pointer to a heap page, either of this contract or belonging to a different contract.
- Data on stack page can be addressed by their absolute addresses, or relative
to
SP
. - Global mutable variables are allocated on stack.
Callstack
EraVM has a separate call stack, a utility data structure that holds information about call frames. There are two kinds of call frames in the EraVM, corresponding to near and far calls:
- Far call frame corresponds to a call to a different contract.
- Near call frame corresponds to a near call to the code inside the same contract. Near calls are a low-level mechanism that is used mostly in system contracts.
Call stack differs from the data stack pages, described in section Memory.
Binary layout
The binary file published on chain and passed to EraVM has no structure. It is an image loaded at the beginning of the code page (with offset 0).
The initial value of PC
is zero, therefore the execution will start at the
first instruction on the code page.
Instructions or functions in .text
section are not reordered, so the first
instruction appearing in the assembly file will be executed first, regardless of
labels.
The length of the binary should be an odd number of words, that is, bytes.
The last word in the binary file is the metadata hash, see section Metadata Hash.
Symbols
There are three default predefined symbols:
DEFAULT_UNWIND
: default exception handler / stack unroller for near call instructioncall
.DEFAULT_FAR_RETURN
: default stack unroller for returns (see Landing Pads).DEFAULT_FAR_REVERT
: default stack unroller for reverts (see Landing Pads).
If the user did not define one of these labels, the assembler will define it and emit a corresponding landing pad (see Landing Pads).
Linking and loading
This section details how the assembly file structure is flattened into a loadable image.
The binary file is divided into three regions:
- Initializer.
- Instructions.
- Constant pool.
The following subsections describe these regions.
Initializer region
Mutable global variables are allocated in the beginning of the stack page, not in code. The stack page supports absolute addressing, therefore the global variables can be accessed directly by their addresses.
If the assembly file defines global variables, the assembler will emit a special initializer code in the beginning of the program; otherwise, initializer region is skipped and we pass to the code region immediately.
The first instruction of the initializer region is incsp <number of globals>
.
It allocates one word on a data stack per global mutable variable.
For each global that is initialized with a non-zero value, assembler does the following:
- Copies its initializer to
.rodata
, which will be loaded to the code page. - Emits an instruction:
add code[INIT], r0, stack[IDX]
where:
INIT
is the address of the initializer in the.rodata
.IDX
is the index of the global variable.
For example, the following program:
.text
some_label:
sub! r0, r0, r0
jump @some_label
.data
my_globals:
.cell 32
.rodata
.cell 0
Will be translated as if it were written this way:
.text
init_globals:
incsp 1
add code[@global_init_0], r0, stack[0]
some_label:
sub! r0, r0, r0
jump @some_label
.rodata
.cell 0
global_init_0:
.cell 32
Code region
The .text
section is emitted after the initializer region or, if there are no
globals, right in the start of the binary file. It is followed by the landing
pads and the padding, before the start of the constant pool region.
Landing Pads
After emitting the instructions provided in the .text
section of the assembly
file, the assembler may emit the landing pads for near calls, returns and
reverts.
This happens for three predefined symbols: DEFAULT_UNWIND
,
DEFAULT_FAR_RETURN
and DEFAULT_FAR_REVERT
.
For example, if the symbol DEFAULT_FAR_RETURN
is not explicitly defined, it
will be defined automatically and the following landing pad will be appended to
the executable code:
;; landing pad for returns
DEFAULT_FAR_RETURN:
retl @DEFAULT_FAR_RETURN
If the contract executes an instruction retl @DEFAULT_FAR_RETURN
, the control is
passed to the address DEFAULT_FAR_RETURN
, which hosts the same instruction.
This starts a loop, popping all near call frames from the callstack. The last
retl
will perform a far return from the contract.
This allows emitting retl @DEFAULT_FAR_RETURN
to return from any place inside
the contract, no matter how many near calls are currently active.
If neither of the predefined symbols DEFAULT_UNWIND
, DEFAULT_FAR_RETURN
,
DEFAULT_FAR_REVERT
was defined explicitly, the following code will be emitted
after the .text
section.
;; landing pad for near calls
DEFAULT_UNWIND:
ret.panic.to_label r0, @DEFAULT_UNWIND
;; landing pad for returns
DEFAULT_FAR_RETURN:
ret.ok.to_label r1, @DEFAULT_FAR_RETURN
;; landing pad for reverts
DEFAULT_FAR_REVERT:
ret.revert.to_label r1, @DEFAULT_FAR_REVERT
Code padding
The code section starts at 0, if we count the initializing code as its part.
Therefore, it is aligned on a 32 byte boundary.
If the total number of instructions, with the landing pads, is not divisible by
4, the assembler emits 1 to 3 INVALID
instructions as a padding.
This way, the instructions will fill a certain number of words completely, and
the following region (constant pool region) is aligned on a 32 byte boundary as
well.
Constant pool region
The constant pool region is aligned on a 32 byte boundary. It is placed immediately after the code region and contains:
- Constants defined in
.rodata
section. - Initializers for mutable globals.
- Padding: nothing or a zero-word to ensure, that the total length of the binary file, including the following hash, equals to an odd number of words.
- Optionally, metadata hash.
Metadata Hash
An optional, implementation-defined hash of the contract metadata, which may include its source. Depending on the initial layer where the compilation starts (a Solidity contract, its YUL code, assembly), the hash value may be different.
Currently either the hash is computed as keccak256
, or it is omitted completely.