SSA Instructions

Foreword

The following are notes on the SSA IR (static single assignment, intermediate representation) instructions in LuaJIT 2.0. See lj_ir.h, lj_asm.c, lj_opt_fold.c, and dump.lua for details.

Background notes:

Status

Warning: this list is incomplete, outdated, and likely has inaccuracies. This document reflects version 2.0 GIT HEAD as of July 7th, 2012. Most recent commit: 6a67fa8a.

Most of the instructions listed here need to be further researched/explained.

Overview

Basic example:

$ ./luajit -jdump=bitmsr
LuaJIT 2.0.0-beta10 -- Copyright (C) 2005-2012 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 AMD fold cse dce fwd dse narrow loop abc sink fuse
> local x = 1.2 for i=1,1e3 do x = x * -3 end
---- TRACE 1 start stdin:1
0006  MULVN    0   0   1  ; -3
0007  FORL     1 => 0006
---- TRACE 1 IR
....              SNAP   #0   [ ---- ]
0001 rbp      int SLOAD  #2    CI
0002 xmm7  >  num SLOAD  #1    T
0003 xmm7   + num MUL    0002  -3  
0004 rbp    + int ADD    0001  +1  
....              SNAP   #1   [ ---- 0003 ]
0005       >  int LE     0004  +1000
....              SNAP   #2   [ ---- 0003 0004 ---- ---- 0004 ]
0006 ------------ LOOP ------------
0007 xmm7   + num MUL    0003  -3  
0008 rbp    + int ADD    0004  +1  
....              SNAP   #3   [ ---- 0007 ]
0009       >  int LE     0008  +1000
0010 rbp      int PHI    0004  0008
0011 xmm7     num PHI    0003  0007
---- TRACE 1 mcode 81
394cffa3  mov dword [0x4183f4a0], 0x1
394cffae  movsd xmm0, [0x4184f698]
394cffb7  cvtsd2si ebp, [rdx+0x8]
394cffbc  cmp dword [rdx+0x4], 0xfffeffff
394cffc3  jnb 0x394c0010	->0
394cffc9  movsd xmm7, [rdx]
394cffcd  mulsd xmm7, xmm0
394cffd1  add ebp, +0x01
394cffd4  cmp ebp, 0x3e8
394cffda  jg 0x394c0014	->1
->LOOP:
394cffe0  mulsd xmm7, xmm0
394cffe4  add ebp, +0x01
394cffe7  cmp ebp, 0x3e8
394cffed  jle 0x394cffe0	->LOOP
394cffef  jmp 0x394c001c	->3
---- TRACE 1 stop -> loop

The above prints the bytecode of the trace, the IR generated from that bytecode with snapshots, and the machine code generated from the IR.

The columns of the IR are as follows:

1st column: IR instruction number (implicit SSA ref)
2nd column: physical CPU register or physical CPU stack slot that
  value is written to when converted to machine code.
  '[%x+]' (rather than register name) indicates hexadecimal offset
  from stack pointer.
  (This column is only present if the 'r' flags is included in -jdump, which
  augments the IR with register/stack slots.  It is not part of the IR itself.)
3nd column: Instruction flags:
  ">" (IRT_GUARD = 0x80 instruction flag) are locations of
    guards (leading to possible side exits from the trace).
  "+" (IRT_ISPHI = 0x40 instruction flag) indicates
    instruction is left or right PHI operand. (i.e referred
    to in some PHI instruction).
4rd column: IR type (see IR Types below)
5th column: IR opcode (see opcode reference)
6th/7th column: IR operands (SSA refs or literals)
  '#' prefixes refer to slot numbers, used in SLOADS.
     #0 is the base frame (modified only in tail calls).
     #1 is the first slot in the first frame (register 0 in
     the bytecode)
  '[+-]' prefixes indicate positive or negative numeric literals.
  '[0x%d+]' and NULL are memory addresses.
  '"..."' are strings.
  '@' prefixes indicate slots (what is this?).
  Other possible values: "bias" (number 2^52+2^51 ?), "userdata:%p",
     "userdata:%p" (table)--when do these occur?.

See also SSA dump format comments: http://lua-users.org/lists/lua-l/2008-06/msg00225.html (older version). See formatk in dump.lua.

Each snaphot (SNAP) lists the modified stack slots and their values. The i-th value in the snapshot list represents the index of the IR that writes a value in slot number #i. '---' indicates that the slot is not written. Frames are separated by '|'. For further comments on snapshots, see http://lua-users.org/lists/lua-l/2009-11/msg00089.html.

IR types (see irtype_text in dump.lua or IRTDEF in lj_ir.h:

"nil" 0
"fal" 1
"tru" 2
"lud" 3
"str" 4
"p32" 5
"thr" 6
"pro" 7
"fun" 8
"p64" 9
"cdt" 10
"tab" 11
"udt" 12
"flt" 13
"num" 14
"i8 " 15
"u8 " 16
"i16" 17
"u16" 18
"int" 19
"u32" 20
"i64" 21
"u64" 22

"Mode bits" (used in below opcode definitions, the second column): Commutative (C), {Normal/Ref (N), Alloc (A), Load (L), Store (S)}, Non-weak guard (W).

Guarded assertions

The trace is exited if these checks fail.

LT, N , ref, ref

less than. Exits unless op1 < op2.

GE, N , ref, ref

greater than or equal. Exits unless op1 >= op2.

LE, N , ref, ref

Exits unless op1 <= op2.

less than or equal

GT, N , ref, ref

Exits unless op1 > op2.

greater than

The next four instructions are unordered comparisons.This concerns floating point comparisons where one value may be NaN. NaN is not ordered. For background see [http://docs.hp.com/en/B3906-90006/ch02s05.html#a9v1zf219haasz]. When ints rather than floats are used, this appears to do unsigned comparison.

ULT, N , ref, ref

unordered less than

UGE, N , ref, ref

unordered greater than or equal

ULE, N , ref, ref

unordered less than or equal

UGT, N , ref, ref

unordered greater than

EQ, C , ref, ref

equal

NE, C , ref, ref

not equal

ABC, N , ref, ref

ABCelim: Array Bounds Check Elimination?

RETF, S , ref, ref

Return to lower frame.

Compiled for Lua functions.

Guard that it goes to the right spot.

Miscellaneous Ops

NOP, N , __, __

no operation (NOP).

BASE, N , lit, lit

(?)

PVAL, N , lit, ___

GCSTEP, S , __, __

Causes an explicit garbage collection in JIT'd code if above threshold and avoids further implicit ones.

Exits the trace if the garbage collector is already in an atomic or finalized state.

HIOP, S , ref, ref

Support for 64 bit operations in 32 bit mode.

Unused on x64 or without FFI.

LOOP, S , __, __

Middle part of a loop.

LOOP is a guard, so the snapshot number is up to date.

LOOP marks the transition from the variant to the invariant part.

Implies GCSTEP instruction.

USE, S , ref, ___

PHI, S , ref, ref

The SSA phi thing. Should be explained further...

RENAME, S , ref, lit

Emitted by the register allocation algorithm.

Used when a renaming and moving registers.

Constants

KPRI, N , __, __

Constant primitive.

Primitives are some form of a TValue. Possibly nil/false/true.

KINT, N , cst, ___

KGC, N , cst, ___

KPTR, N , cst, ___

Const pointer to possibly non-const data.

KKPTR, N , cst, ___

Const pointer to definitely const data. Notes from commit: "Only content known by the VM to be const qualifies. Content tagged as const by users (e.g. const char *) doesn't."

commit

KNULL, N , cst, ___

KNUM, N , cst, ___

KINT64, N , cst, ___

KSLOT, N , ref, lit

Related to alias analysis for array and hash access using key-based disambiguation and array and hash load forwarding.

Bit ops

BNOT, N , ref, ___

Bitwise not. See bit.bnot.

Arithmetic ops

ADD, C , ref, ref

x + y

op1: x, op2: y

SUB, N , ref, ref

x - y

MUL, C , ref, ref

x * y

DIV, N , ref, ref

x / y

POW, N , ref, ref

Raise to power (with integer exponent).

NEG, N , ref, ref

Negation (-x).

ABS, N , ref, ref

Absolute value. see math.abs.

ATAN2, N , ref, ref

atan2. see math.atan2.

LDEXP, N , ref, ref

see math.ldexp.

MIN, C , ref, ref

see math.min.

MAX, C , ref, ref

see math.max.

FPMATH, N , ref, lit

Floating point math operation.

op2: (values from irfpm in vmdef.lua and IRFPMDEF in lj_ir.h)
"floor" - FLOOR
"ceil"  - CEIL
"trunc" - TRUNC
"sqrt"  - SQRT
"exp"   - EXP
"exp2"  - EXP2
"log"   - LOG
"log2"  - LOG2
"log10" - LOG10
"sin"   - SIN
"cos"   - COS
"tan"   - TAN
"other" - OTHER

Overflow-checking arithmetic ops

The following instructions utilize integer arithmetic and are a part of the dual-number mode and narrowing of numbers to integers optimization.

ADDOV, C , ref, ref

Integer addition.

SUBOV, N , ref, ref

Integer subtraction.

MULOV, CW, ref, ref

Integer multiplication.

Memory ops

A = array, H = hash, U = upvalue, F = field, S = stack.

Memory references.

AREF, R , ref, ref

Array reference

HREFK, R , ref, ref

Hash reference (with constant key?)

op1: index to IR for table hash part (tab.node)?
op2: key?

HREF, L , ref, ref

Hash reference.

op1: index to IR for table
op2: index to IR for key

NEWREF, S , ref, ref

This is something related to creating a new table key (lj_tab_newkey).

op1: index to IR for table
op2: index to IR for key

UREFO, LW, ref, lit

Upvalue reference, open?

op1 is the index to the IR for the function value.
op2 - bits 0..7 are some type of hash value for the upvalue (see rec_upvalue). Bits above that represent the upvalue index (0-based integer) in the function.

UREFC, LW, ref, lit

upvalue reference, closed?

op1 and op2 are the same as in UREFO.

FREF, R , ref, lit

Field reference

op1:
op2: field identifier - see op2 values in FLOAD.

STRREF, N , ref, ref

String reference

Loads and Stores

ALOAD, L , ref, ___

Array load

HLOAD, L , ref, ___

Hash table load

ULOAD, L , ref, ___

Upvalue load

op1: index of IR for upvalue reference (e.g. UREFC/UREFO).

FLOAD, L , ref, lit

Field load. This accesses the field identified by (lit) in a C struct located at the address referred to in (ref). The fields are at known constant offsets from the structure base address.

op2: (values from irfield in vmdef.lua and IRFLDEF in lj_ir.h)
"str.len" (0) STR_LEN - string length (GCstr.len)
"func.env" (1) FUNC_ENV - function environment (GCfunc.l.env)
"tab.meta" (2) TAB_META - table metatable (GCtab.metatable)
"tab.array" (3) TAB_ARRAY - table array part (GCtab.array)
"tab.node" (4) TAB_NODE - table hash part (GCtab.node)
"tab.asize" (5) TAB_ASIZE - table array part size (GCtab.asize)
"tab.hmask" (6) TAB_HMASK - table "Hash part mask (size of hash part - 1)" (GCtab.hmask)
"tab.nomm" (7) TAB_NOMM - table "Negative cache for fast metamethods" (GCtab.nomm)
"udata.meta" (8) UDATA_META - userdata metatable (GCudata.metatable)
"udata.udtype" (9) UDATA_UDTYPE "Userdata type" (GCudata.udtype)
"udata.file" (10) UDATA_FILE
"cdata.typeid" (11) CDATA_TYPEID (GCcdata.typeid) - FFI C type ID (unique to every C type)
"cdata.ptr" (12) CDATA_PTR

XLOAD, L , ref, lit

Load from pointer? Note: can occur with FFI cdata.

op1: index of IR for pointer to load from
op2: bitwise OR of

"R" 1 (IRXLOAD_READONLY,  Load from read-only data.)
"V" 2 (IRXLOAD_VOLATILE,  Load from volatile data.)
"U" 4 (IRXLOAD_UNALIGNED, Unaligned load.)

SLOAD, L , lit, lit

Stack load

op1: stack slot number to load from.
op2 values:

"P"  1 (IRSLOAD_PARENT, Coalesce with parent trace)
"F"  2 (IRSLOAD_FRAME, Load hiword of frame)
"T"  4 (IRSLOAD_TYPECHECK, Needs type check)
"C"  8 (IRSLOAD_CONVERT, Number to integer conversion)
"R" 16 (IRSLOAD_READONLY, Read-only, omit slot store)
"I" 32 (IRSLOAD_INHERIT, Inherited by exits/side traces)

VLOAD, L , ref, ___

Vararg load

ASTORE, S , ref, ref

Array store

HSTORE, S , ref, ref

Hash table store

op1: index of IR for reference to value in hash table.
op2: key

USTORE, S , ref, ref

Upvalue store

FSTORE, S , ref, ref

Field store

XSTORE, S , ref, ref

Store to pointer? Note: can occur with FFI cdata.

op1: index of IR for pointer to store to
op2: index of of IR for value to store

Allocations.

SNEW, N , ref, ref

Create new string. (lj_str_new)

op1: const char *str; op2: size_t len.

XSNEW, A , ref, ref

TNEW, AW, lit, lit

Create new table.

TDUP, AW, ref, ___

Duplicate a table. (lj_tab_dup)

op1: table to copy

CNEW, AW, ref, ref

Allocate new FFI cdata.

CNEWI, NW, ref, ref

Initialize new FFI cdata.

Write barriers

TBAR, S , ref, ___

(?)

OBAR, S , ref, ref

Specialized barrier for closed upvalue?

XBAR, S , __, __

Writer barrier for XLOAD/XSTORE.

Type conversions

CONV, N , ref, lit

Various int and float number conversions.

op1: lref
op2: type conversion.

Bits 0..4 (IRCONV_SRCMASK) are type converted from.
Bits 5..9 (IRCONV_DSTMASK) are type converted to.  See "IR Types" above.
Bit 10 (0x400, IRCONV_TRUNC) is "trunc" (Truncate number to integer).
Bit 11 (0x800, IRCONV_SEXT) is "sext" (Sign-extend integer to integer).
Bits 14..15 are 2 "index"  or 3 "check". (?? - dump.lua and IRCONV_* inconsistent?)

TOBIT, N , ref, ref

(see bit.tobit)

op1: dest
op1: source

TOSTR, N , ref, ___

Convert to string.

STRTO, N , ref, ___

Convert string to number.

Calls

CALLN, N , ref, lit

Call Normal/Ref (N)?

CALLL, L , ref, lit

Call Load (L)?

CALLS, S , ref, lit

Call Store (S)?

CALLXS, S , ref, ref

CARG, N , ref, ref

(?) something related to function call arguments

The LuaJIT Wiki