not logged in | [Login]
The following are notes on the SSA IR (static single assignment, intermediate representation) instructions in LuaJIT 2.0. See lj_ir.h, lj_asm.c, lj_opt_fold.c, and dump.lua for details.
Background notes:
Warning: this list is incomplete, outdated, and likely has inaccuracies. This document reflects version 2.0 GIT HEAD as of July 7th, 2012. Most recent commit: 6a67fa8a.
Most of the instructions listed here need to be further researched/explained.
Basic example:
$ ./luajit -jdump=bitmsr
LuaJIT 2.0.0-beta10 -- Copyright (C) 2005-2012 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 AMD fold cse dce fwd dse narrow loop abc sink fuse
> local x = 1.2 for i=1,1e3 do x = x * -3 end
---- TRACE 1 start stdin:1
0006 MULVN 0 0 1 ; -3
0007 FORL 1 => 0006
---- TRACE 1 IR
.... SNAP #0 [ ---- ]
0001 rbp int SLOAD #2 CI
0002 xmm7 > num SLOAD #1 T
0003 xmm7 + num MUL 0002 -3
0004 rbp + int ADD 0001 +1
.... SNAP #1 [ ---- 0003 ]
0005 > int LE 0004 +1000
.... SNAP #2 [ ---- 0003 0004 ---- ---- 0004 ]
0006 ------------ LOOP ------------
0007 xmm7 + num MUL 0003 -3
0008 rbp + int ADD 0004 +1
.... SNAP #3 [ ---- 0007 ]
0009 > int LE 0008 +1000
0010 rbp int PHI 0004 0008
0011 xmm7 num PHI 0003 0007
---- TRACE 1 mcode 81
394cffa3 mov dword [0x4183f4a0], 0x1
394cffae movsd xmm0, [0x4184f698]
394cffb7 cvtsd2si ebp, [rdx+0x8]
394cffbc cmp dword [rdx+0x4], 0xfffeffff
394cffc3 jnb 0x394c0010 ->0
394cffc9 movsd xmm7, [rdx]
394cffcd mulsd xmm7, xmm0
394cffd1 add ebp, +0x01
394cffd4 cmp ebp, 0x3e8
394cffda jg 0x394c0014 ->1
->LOOP:
394cffe0 mulsd xmm7, xmm0
394cffe4 add ebp, +0x01
394cffe7 cmp ebp, 0x3e8
394cffed jle 0x394cffe0 ->LOOP
394cffef jmp 0x394c001c ->3
---- TRACE 1 stop -> loop
The above prints the bytecode of the trace, the IR generated from that bytecode with snapshots, and the machine code generated from the IR.
The columns of the IR are as follows:
1st column: IR instruction number (implicit SSA ref)
2nd column: physical CPU register or physical CPU stack slot that
value is written to when converted to machine code.
'[%x+]' (rather than register name) indicates hexadecimal offset
from stack pointer.
(This column is only present if the 'r' flags is included in -jdump, which
augments the IR with register/stack slots. It is not part of the IR itself.)
3nd column: Instruction flags:
">" (IRT_GUARD = 0x80 instruction flag) are locations of
guards (leading to possible side exits from the trace).
"+" (IRT_ISPHI = 0x40 instruction flag) indicates
instruction is left or right PHI operand. (i.e referred
to in some PHI instruction).
4rd column: IR type (see IR Types below)
5th column: IR opcode (see opcode reference)
6th/7th column: IR operands (SSA refs or literals)
'#' prefixes refer to slot numbers, used in SLOADS.
#0 is the base frame (modified only in tail calls).
#1 is the first slot in the first frame (register 0 in
the bytecode)
'[+-]' prefixes indicate positive or negative numeric literals.
'[0x%d+]' and NULL are memory addresses.
'"..."' are strings.
'@' prefixes indicate slots (what is this?).
Other possible values: "bias" (number 2^52+2^51 ?), "userdata:%p",
"userdata:%p" (table)--when do these occur?.
See also SSA dump format comments: http://lua-users.org/lists/lua-l/2008-06/msg00225.html (older version).
See formatk
in dump.lua.
Each snaphot (SNAP) lists the modified stack slots and their values. The i-th value in the snapshot list represents the index of the IR that writes a value in slot number #i. '---' indicates that the slot is not written. Frames are separated by '|'. For further comments on snapshots, see http://lua-users.org/lists/lua-l/2009-11/msg00089.html.
IR types (see irtype_text
in dump.lua or
IRTDEF
in lj_ir.h:
"nil" 0
"fal" 1
"tru" 2
"lud" 3
"str" 4
"p32" 5
"thr" 6
"pro" 7
"fun" 8
"p64" 9
"cdt" 10
"tab" 11
"udt" 12
"flt" 13
"num" 14
"i8 " 15
"u8 " 16
"i16" 17
"u16" 18
"int" 19
"u32" 20
"i64" 21
"u64" 22
"Mode bits" (used in below opcode definitions, the second column): Commutative (C), {Normal/Ref (N), Alloc (A), Load (L), Store (S)}, Non-weak guard (W).
The trace is exited if these checks fail.
less than. Exits unless op1 < op2.
greater than or equal. Exits unless op1 >= op2.
Exits unless op1 <= op2.
less than or equal
Exits unless op1 > op2.
greater than
The next four instructions are unordered comparisons.This concerns floating point comparisons where one value may be NaN. NaN is not ordered. For background see [http://docs.hp.com/en/B3906-90006/ch02s05.html#a9v1zf219haasz]. When ints rather than floats are used, this appears to do unsigned comparison.
unordered less than
unordered greater than or equal
unordered less than or equal
unordered greater than
equal
not equal
ABCelim: Array Bounds Check Elimination?
Return to lower frame.
Compiled for Lua functions.
Guard that it goes to the right spot.
no operation (NOP).
(?)
Causes an explicit garbage collection in JIT'd code if above threshold and avoids further implicit ones.
Exits the trace if the garbage collector is already in an atomic or finalized state.
Support for 64 bit operations in 32 bit mode.
Unused on x64 or without FFI.
Middle part of a loop.
LOOP is a guard, so the snapshot number is up to date.
LOOP marks the transition from the variant to the invariant part.
Implies GCSTEP instruction.
The SSA phi thing. Should be explained further...
Emitted by the register allocation algorithm.
Used when a renaming and moving registers.
Constant primitive.
Primitives are some form of a TValue. Possibly nil/false/true.
Const pointer to possibly non-const data.
Const pointer to definitely const data. Notes from commit: "Only content known by the VM to be const qualifies. Content tagged as const by users (e.g. const char *) doesn't."
Related to alias analysis for array and hash access using key-based disambiguation and array and hash load forwarding.
Bitwise not. See bit.bnot.
See bit.bswap.
See bit.band.
See bit.bor.
See bit.bxor.
See bit.lshift.
See bit.rshift.
See bit.arshift.
See bit.rol.
See bit.ror.
x + y
op1: x, op2: y
x - y
x * y
x / y
Raise to power (with integer exponent).
Negation (-x).
Absolute value. see math.abs.
atan2. see math.atan2.
see math.ldexp.
see math.min.
see math.max.
Floating point math operation.
op2: (values from irfpm in vmdef.lua and IRFPMDEF in lj_ir.h)
"floor" - FLOOR
"ceil" - CEIL
"trunc" - TRUNC
"sqrt" - SQRT
"exp" - EXP
"exp2" - EXP2
"log" - LOG
"log2" - LOG2
"log10" - LOG10
"sin" - SIN
"cos" - COS
"tan" - TAN
"other" - OTHER
The following instructions utilize integer arithmetic and are a part of the dual-number mode and narrowing of numbers to integers optimization.
Integer addition.
Integer subtraction.
Integer multiplication.
A = array, H = hash, U = upvalue, F = field, S = stack.
Memory references.
Array reference
Hash reference (with constant key?)
Hash reference.
This is something related to creating a new table key (lj_tab_newkey).
Upvalue reference, open?
upvalue reference, closed?
op1 and op2 are the same as in UREFO
.
Field reference
FLOAD
.String reference
Array load
Hash table load
Upvalue load
op1: index of IR for upvalue reference (e.g. UREFC
/UREFO
).
Field load. This accesses the field identified by (lit) in a C struct located at the address referred to in (ref). The fields are at known constant offsets from the structure base address.
op2: (values from irfield in vmdef.lua and IRFLDEF in lj_ir.h)
"str.len" (0) STR_LEN - string length (GCstr.len)
"func.env" (1) FUNC_ENV - function environment (GCfunc.l.env)
"tab.meta" (2) TAB_META - table metatable (GCtab.metatable)
"tab.array" (3) TAB_ARRAY - table array part (GCtab.array)
"tab.node" (4) TAB_NODE - table hash part (GCtab.node)
"tab.asize" (5) TAB_ASIZE - table array part size (GCtab.asize)
"tab.hmask" (6) TAB_HMASK - table "Hash part mask (size of hash part - 1)" (GCtab.hmask)
"tab.nomm" (7) TAB_NOMM - table "Negative cache for fast metamethods" (GCtab.nomm)
"udata.meta" (8) UDATA_META - userdata metatable (GCudata.metatable)
"udata.udtype" (9) UDATA_UDTYPE "Userdata type" (GCudata.udtype)
"udata.file" (10) UDATA_FILE
"cdata.typeid" (11) CDATA_TYPEID (GCcdata.typeid) - FFI C type ID (unique to every C type)
"cdata.ptr" (12) CDATA_PTR
Load from pointer? Note: can occur with FFI cdata.
"R" 1 (IRXLOAD_READONLY, Load from read-only data.)
"V" 2 (IRXLOAD_VOLATILE, Load from volatile data.)
"U" 4 (IRXLOAD_UNALIGNED, Unaligned load.)
Stack load
"P" 1 (IRSLOAD_PARENT, Coalesce with parent trace)
"F" 2 (IRSLOAD_FRAME, Load hiword of frame)
"T" 4 (IRSLOAD_TYPECHECK, Needs type check)
"C" 8 (IRSLOAD_CONVERT, Number to integer conversion)
"R" 16 (IRSLOAD_READONLY, Read-only, omit slot store)
"I" 32 (IRSLOAD_INHERIT, Inherited by exits/side traces)
Vararg load
Array store
Hash table store
Upvalue store
Field store
Store to pointer? Note: can occur with FFI cdata.
Create new string. (lj_str_new)
Create new table.
Duplicate a table. (lj_tab_dup)
Allocate new FFI cdata.
Initialize new FFI cdata.
(?)
Specialized barrier for closed upvalue?
Writer barrier for XLOAD/XSTORE.
Various int and float number conversions.
Bits 0..4 (IRCONV_SRCMASK) are type converted from.
Bits 5..9 (IRCONV_DSTMASK) are type converted to. See "IR Types" above.
Bit 10 (0x400, IRCONV_TRUNC) is "trunc" (Truncate number to integer).
Bit 11 (0x800, IRCONV_SEXT) is "sext" (Sign-extend integer to integer).
Bits 14..15 are 2 "index" or 3 "check". (?? - dump.lua and IRCONV_* inconsistent?)
(see bit.tobit)
Convert to string.
Convert string to number.
Call Normal/Ref (N)?
Call Load (L)?
Call Store (S)?
(?) something related to function call arguments