Tapioca VM ByteCode

This note is not stable and may be changed at any time.

ByteCode Overview

  • All of the bytecodes are 4 bytes totally 32-bit width.
  • Byte codes can be treated as a byte stream.
    • The notation below will be ordered by byte order.
  • default format is OPC(1byte), R1(1byte), R2(1byte), R3(1byte) order.
  • Total 256 registers available (8bit).
  • Registers are 64bit width.
    • Optional 32bit width registers can be implemented.
    • Different register width can break the binary compatibility.
  • Registers can be interger unsigned float or object reference.
  • Registers are not typed.
    • compiler or programmer should keep the type in mind and select the operator.

Reserved Operators

Mnemonic Code Description
NOP 0x00 XX XX XX No Operation (for alignment)

Binary Operators

Mnemonic Code Description
ADD.i 0x10 R1 R2 R3 R1 <- R2 + R3
SUB.i 0x11 R1 R2 R3 R1 <- R2 - R3
MUL.i 0x12 R1 R2 R3 R1 <- R2 * R3
DIV.i 0x13 R1 R2 R3 R1 <- R2 / R3
ADD.u 0x14 R1 R2 R3 R1 <- R2 + R3
SUB.u 0x15 R1 R2 R3 R1 <- R2 - R3
MUL.u 0x16 R1 R2 R3 R1 <- R2 * R3
DIV.u 0x17 R1 R2 R3 R1 <- R2 / R3
ADD.f 0x18 R1 R2 R3 R1 <- R2 + R3
SUB.f 0x19 R1 R2 R3 R1 <- R2 - R3
MUL.f 0x1A R1 R2 R3 R1 <- R2 * R3
DIV.f 0x1B R1 R2 R3 R1 <- R2 / R3
ADD.r 0x1C R1 R2 R3 R1 <- R2.__add( R3 )
SUB.r 0x1D R1 R2 R3 R1 <- R2.__sub( R3 )
MUL.r 0x1E R1 R2 R3 R1 <- R2.__mul( R3 )
DIV.r 0x1F R1 R2 R3 R1 <- R2.__div( R3 )

Flow Control Operators

Mnemonic Code Description
JMP 0x20 DD DD DD Jump to the address
JE 0x21 DD R1 R2 if ((u64)R1 == (u64)R2) goto DD
JG.i 0x22 DD R1 R2 if ((i64)R1 > (i64)R2) goto DD
JL.i 0x23 DD R1 R2 if ((i64)R1 < (i64)R2) goto DD
JG.u 0x24 DD R1 R2 if ((u64)R1 > (u64)R2) goto DD
JL.u 0x25 DD R1 R2 if ((u64)R1 < (u64)R2) goto DD
JE.f 0x26 DD R1 R2 if ((f64)R1 == (f64)R2) goto DD
JG.f 0x27 DD R1 R2 if ((f64)R1 > (f64)R2) goto DD
JL.f 0x28 DD R1 R2 if ((f64)R1 < (f64)R2) goto DD
JE.r 0x29 DD R1 R2 if (*R1.eq(R2)) goto DD
JG.r 0x2A DD R1 R2 if (*R1.gt(R2)) goto DD
JL.r 0x2B DD R1 R2 if (*R1.lt(R2)) goto DD
JT.r 0x2C DD R1 R2 if (*R1.type == *R2) goto DD
jmp_if_? 0x2D DD R1 CC if (CC(R1)) goto DD

CC conditonal list

id condition description
0x00 type_nil Not valid reference
0x01 type_boolean  
0x02 type_signed If this object can be loaded as signed integer
0x03 type_i8  
0x04 type_i16  
0x05 type_i32  
0x06 type_i64  
0x07 type_unsigned If this object can be loaded as unsigned integer
0x08 type_u8  
0x09 type_u16  
0x0A type_u32  
0x0B type_u64  
0x0C type_float If this object can be loaded as float
0x0D type_f32  
0x0E type_f64  
0x0F type_array  
0x10 type_map  
0x11 type_name  
0x12 type_function  
0x13 type_closure  

Load operators

load_imm_int 0x30 R1 DD DD Signed extended DDDD to int64 and load to R1
load_imm_uint 0x31 R1 DD DD Extened dBBBB to uint64 and load to R1
load_imm_float 0x32 R1 DD DD Extend binary 16 DDDD to binary 64 and load to R1
load_const 0x33 R1 DD DD Load const pool DDDD to R1
load_global 0x34 R1 DD DD Load global const pool DDDD to R1

Function Call Operators

Mnemonic Code Description
INVOKE 0x30 R1 R2 NN Call function. *R1(R2, R3, … R2+NN)
RETURN 0x31 R1 Return from function.

Binary Format

Types

  • Types
    • Boolean represents true or false
    • Nil represents nil
    • Signed represents a signed integer
    • Unsigned represents an unsigned integer
    • Float represents a IEEE 754 double precision floating point number including NaN and Infinity
    • Map represents key-value pairs of objects
    • Array represents a sequence of objects
    • String represents a UTF-8 string
    • Extension represents a tuple of type information and a byte array where type information is an integer whose meaning is defined by applications or MessagePack specification

Formats

Type Identifier

ccc int uint float
000 int8 uint8 -
001 int16 uint16 binary16
010 int32 uint32 binary32
011 int64 uint64 binary64
100 - - binary128
101 - - -
110 - - -
111 va_int va_uint mp
DataType Signature BON TVM Description
Reserved 0000 00xx N N  
Reserved 0000 0100 N N Should never be shown (Treated as nil)
Null 0000 0101 Y Y Null Object
Boolean 0000 011b Y Y b = 0 for false
Float 0000 1ccc Y Y ccc for bits
Signed 0001 0ccc Y Y ccc for bits
Unigned 0001 0ccc Y Y ccc for bits
Dict 001s ssss Y Y* sssss for length, 11111 for va_len
Array 010s ssss Y Y* sssss for length, 11111 for va_len
name 011s ssss Y Y* sssss for length, 11111 for va_len
DomainSpecific 1xxx xxxx Y Y* MUST folow with an va_len

Type Identifier

DataType Sign(Bin) Signature TBON (1) TBON (2) TVM Type Description
Reserved 0000 0000 0x00 N Y N Spiltor
Reserved 0000 0001 0x01 N N Y External Reference
Reserved 0000 0010 0x02 N Y N Const Pool
Reserved 0000 0011 0x03 N Y N Reference to Const Pool
Reserved 0000 0100 0x04 Y Y Y Object
Nil 0000 0101 0x05 Y Y Y Nil Object
Boolean 0000 011b 0x06 Y Y Y b = 0 for false or type signture
Reserved 0000 1000 0x08 N N N Reserved for float binary8
Float16 0000 1001 0x09 Y Y Y IEEE754 binary16
Float32 0000 1010 0x0A Y Y Y IEEE754 binary32
Float64 0000 1011 0x0B Y Y Y IEEE754 binary64
Float128 0000 1100 0x0C Y Y Y IEEE754 binary128
Reserved 0000 1101 0x0D N N N Reserved for float binary256
Reserved 0000 1110 0x0E N N N Reserved for float binary512
Reserved 0000 1111 0x0F N N N Reserved for multi-precision float
Signed 8 0001 0000 0x10 Y Y Y Signed 8-bit integer
Signed 16 0001 0001 0x11 Y Y Y Signed 16-bit integer
Signed 32 0001 0010 0x12 Y Y Y Signed 32-bit integer
Signed 64 0001 0011 0x13 Y Y Y Signed 64-bit integer
Reserved   0x14 - 0x16 N N N Reserved for signed integer 128 - 512
Signed VA 0001 0111 0x17 Y Y Y Signed variable-length integer
Unsigned 8 0001 1000 0x18 Y Y Y Unsigned 8-bit integer
Unsigned 16 0001 1001 0x19 Y Y Y Unsigned 16-bit integer
Unsigned 32 0001 1010 0x1A Y Y Y Unsigned 32-bit integer
Unsigned 64 0001 1011 0x1B Y Y Y Unsigned 64-bit integer
Reserved   0x1C - 0x1E N N N Reserved for unsigned integer 128 - 512
Unsigned VA 0001 1111 0x1F Y Y Y Unsigned variable-length integer
Array 0-31 001s ssss 0x20 - 0x3E N Y N Array of 0 - 31 elements
Array VA 0011 1111 0x3F N Y Y Array of variable-length elements
Map 0-31 010s ssss 0x40 - 0x5E N Y N Map of 0 - 31 elements
Map VA 0101 1111 0x5F N Y Y Map of variable-length elements
Name 0-31 011s ssss 0x60 - 0x7E N Y N Name of 0 - 31 characters
Name VA 0111 1111 0x7F N Y Y Name of variable-length characters
Reserved 1xxx xxxx 0x80 - 0xFF N Y N Reserved for domain-specific
  1. Can be used as TBON type signture for array type
  2. Can be stored in TBON

Notation in diagrams

one byte:
+--------+
|        |
+--------+

a variable number of bytes:
+========+
|        |
+========+

variable number of objects stored in MessagePack format:
+~~~~~~~~~~~~~~~~~+
|                 |
+~~~~~~~~~~~~~~~~~+

Type Format

TBON MAGIC

+------------+------------+------------+------------+
| 0x54 ('T') | 0x42 ('B') | 0x4F ('O') | 0x4E ('N') |
+------------+------------+------------+------------+

nil format

Nil format stores nil in 1 byte.

nil:
+--------+
|  0x05  |
+--------+

bool format family

Bool format family stores false or true in 1 byte.

false:
+--------+
|  0x06  |
+--------+

true:
+--------+
|  0x07  |
+--------+

字节码二进制格式

类型标签

ccc int uint float
000 int8 uint8 -
001 int16 uint16 binary16
010 int32 uint32 binary32
011 int64 uint64 binary64
100 - - binary128
101 - - -
110 - - -
111 va_int va_uint mp
DataType Signature BON TVM Description
Reserved 0000 00xx N N  
Reserved 0000 0100 N N Should never be shown (Treated as nil)
Null 0000 0101 Y Y Null Object
Boolean 0000 011b Y Y b = 0 for false
Float 0000 1ccc Y Y ccc for bits
Signed 0001 0ccc Y Y ccc for bits
Unigned 0001 0ccc Y Y ccc for bits
Dict 001s ssss Y Y* sssss for length, 11111 for va_len
Array 010s ssss Y Y* sssss for length, 11111 for va_len
name 011s ssss Y Y* sssss for length, 11111 for va_len
DomainSpecific 1xxx xxxx Y Y* MUST folow with an va_len

Type Identifier

  1. Type Class
  2. (P) for Primitive type,
  3. (C) for Composite type,
  4. (O) for a Container type.
  5. Can be used as Array element type signature.
  6. (Y) means this type can be used as Array element type signature. Almost Primitive types.
  7. (O) means this type can be used as Array element type signature, but the element should have their own type signature.
DataType Sign(Bin) Signature (1) (2) TVM Type Description
Invalid Obj 0000 0000 0x00 - - Y Invalid Object. This should not appear
Object Root 0000 0001 0x01 O - N Object Root, Someone like --- in YAML
Const Pool 0000 0010 0x02 C - N Set const pool for parser
Const Ref 0000 0011 0x03 C - N Pickup reference in const pool
Object 0000 0100 0x04 O O Y Object
Nil 0000 0101 0x05 P Y Y Nil Object
Boolean 0000 011b 0x06 P Y Y b = 0 for false or type signture
Byte 0000 1000 0x08 - Y N Bytes
Float16 0000 1001 0x09 P Y Y IEEE754 binary16
Float32 0000 1010 0x0A P Y Y IEEE754 binary32
Float64 0000 1011 0x0B P Y Y IEEE754 binary64
Float128 0000 1100 0x0C P Y Y IEEE754 binary128
Reserved 0000 1101 0x0D - - N Reserved for float binary256
Reserved 0000 1110 0x0E - - N Reserved for float binary512
Reserved 0000 1111 0x0F - - N Reserved for multi-precision float
Signed 8 0001 0000 0x10 P Y Y Signed 8-bit integer
Signed 16 0001 0001 0x11 P Y Y Signed 16-bit integer
Signed 32 0001 0010 0x12 P Y Y Signed 32-bit integer
Signed 64 0001 0011 0x13 P Y Y Signed 64-bit integer
Reserved   0x14 - 0x16 - - N Reserved for signed integer 128 - 512
Big Int 0001 0111 0x17 C - Y Reserved for big int
Unsigned 8 0001 1000 0x18 P Y Y Unsigned 8-bit integer
Unsigned 16 0001 1001 0x19 P Y Y Unsigned 16-bit integer
Unsigned 32 0001 1010 0x1A P Y Y Unsigned 32-bit integer
Unsigned 64 0001 1011 0x1B P Y Y Unsigned 64-bit integer
Reserved   0x1C - 0x1E - - N Reserved for unsigned integer 128 - 512
Unsigned VA 0001 1111 0x1F - - Y Unsigned variable-length integer
Name 0-31 001s ssss 0x20 - 0x3E C - N Name of 0 - 31 characters
Name VA 0011 1111 0x3F C - Y Name of variable-length characters
Array 0-31 011s ssss 0x20 - 0x3E C - N Array of 0 - 31 elements
Array VA 0111 1111 0x3F C - Y Array of variable-length elements
Map 0-31 010s ssss 0x40 - 0x5E C - N Map of 0 - 31 elements
Map VA 0101 1111 0x5F C - Y Map of variable-length elements
Extension 1xxx xxxx 0x80 - 0xFF - - N Reserved for domain-specific
Extension 1xxx xxxx 0x80 - 0xFF - - N Reserved for domain-specific

Object Root (Signtaure: 0x00)

Object Root is a special type signture that indicates the start of a new object. It is similar to --- in YAML.

Object Root:
+--------+~~~~~~~~~~~~~~~~~+
|  0x00  | Object Content  |
+--------+~~~~~~~~~~~~~~~~~+

nil Object (Signature: 0x05)

Nil format stores nil in 1 byte.

Nil is not null. null value should not be used in the system.

nil:
+--------+
|  0x05  |
+--------+

bool format family (Signature: 0x06, 0x07)

Bool format family stores false or true in 1 byte.

false:
+--------+
|  0x06  |
+--------+

true:
+--------+
|  0x07  |
+--------+

Float / integer / unsigned integer format family (Signature: 0x08 - 0x1F)

Float family stores a big endian IEEE754 binary in 3, 5, 9, 17 byte. Currently only binary16, binary32, binary64, binary128 are supported.

for float8 like FP8 E4M3 or FP8 E5M2, use array of byte instead.

float16 stores a big endian IEEE754 binary16 in 3 byte:
+--------+--------+--------+
|  0x09  |XXXXXXXX|XXXXXXXX|
+--------+--------+--------+

float32 stores a big endian IEEE754 binary32 in 5 byte:
+--------+--------+--------+--------+--------+
|  0x0A  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+

float64 stores a big endian IEEE754 binary64 in 9 byte:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0x0B  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+

float128 stores a big endian IEEE754 binary128 in 17 byte:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0x0C  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+--------+--------+--------+
int 8 stores a 8-bit signed integer in 2 byte:
+--------+--------+
|  0x10  |XXXXXXXX|
+--------+--------+

int 16 stores a 16-bit big endian signed integer in 3 byte:
+--------+--------+--------+
|  0x11  |XXXXXXXX|XXXXXXXX|
+--------+--------+--------+

int 32 stores a 32-bit big endian signed integer in 5 byte:
+--------+--------+--------+--------+--------+
|  0x12  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+

int 64 stores a 64-bit big endian signed integer in 9 byte:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0x13  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
uint 8 stores a 8-bit unsigned integer in 2 byte:
+--------+--------+
|  0x18  |XXXXXXXX|
+--------+--------+

uint 16 stores a 16-bit big endian unsigned integer in 3 byte:
+--------+--------+--------+
|  0x19  |XXXXXXXX|XXXXXXXX|
+--------+--------+--------+

uint 32 stores a 32-bit big endian unsigned integer in 5 byte:
+--------+--------+--------+--------+--------+
|  0x1A  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+

uint 64 stores a 64-bit big endian unsigned integer in 9 byte:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0x1B  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+

Array family (Signature: 0x20 - 0x3F)

FixArray stores a sequence of objects whose length is upto 15 elements:

+---------+-----------+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|001S SSSS| Signature | S of Objects in Signature type |
+---------+-----------+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+

Array stands for a sequence of objects.

+--------+-----------+================+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|  0x3F  | Signature | Length (VA128) | Length objects in Signature type |
+--------+-----------+================+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
  • The Valid signature is in (2)

对象文件格式

SIGNTURE: `TBON`
ConstPool: []
ObjectRoot: {
  Symbols: {
    int_symbol: 114
    float_symbol: 3.14
    string_symbol: "Hello, World!"
    array_symbol: [1, 2, 3, 4, 5]
    closure_symbol: {
      // Treated as Map
      bytecode: []
      constpool: []
      locals: []
    }
    init_closure: {
      // Treated as Map
      bytecode: []
      constpool: []
      locals: []
    }
  }
  Declears: {
    // For publiced types, interfaces
  }
}