Lua

The LuaJIT Wiki

not logged in | [Login]

Bytecode Optimizations

A single-pass parser transforms Lua source code into LuaJIT Bytecode. The bytecode operates on dynamically-typed values and has semantics very close to the source code. This is the reason why relatively few optimizations can be performed on the bytecode itself -- most of them wouldn't be valid transforms in a dynamic language.

Similarly, virtual register allocation for the bytecode closely follows scoping of local variables in the Lua source code. No attempt is made to minimize or overlap the lifetimes of local variables (virtual registers are cheap and plenty).

Constant Folding

Constant folding is performed for all arithmetic operators (+ - * / % ^ and unary minus) and the logical not operator. If the input consists solely of constants, the output is turned into a constant, too. E.g. this is the bytecode for return 1.5 * 3:

0001    KNUM     0   0      ; 4.5
0002    RET1     0   2

The multiplication has been eliminated and only its result is returned.

Optimizing Composite Conditionals

Composite conditional expressions like x < 10 and x or 10 are turned into a series of branches, instead of evaluating the intermediate expressions into booleans. The logical operators and and or never appear in the bytecode. not only appears when the result is used outside of a logical result context. E.g. this is the bytecode for return a < b and x or y:

0001    GGET     0   0      ; "a"
0002    GGET     1   1      ; "b"
0003    ISGE     0   1
0004    JMP      0 => 0008
0005    GGET     0   2      ; "x"
0006    IST          0
0007    JMP      1 => 0009
0008 => GGET     0   3      ; "y"
0009 => RET1     0   2

Elimination of Conditionals

Conditionals can be eliminated if their input is a constant. This is only done for 'truthness' tests. E.g. this is the bytecode for return "a" and 99:

0001    KSHORT   0  99
0002    RET1     0   2

The check for the truthness of "a" has been eliminated.

A more common idiom is cond and x or y, which is useful as a ternary conditional. But the behavior is not the same, since x is evaluated and must not be false or nil (or the idiom would fail). However x is very often a constant. E.g. this is the bytecode for return a and 10 or 20:

0001    GGET     0   0      ; "a"
0002    ISF          0
0003    JMP      1 => 0006
0004    KSHORT   0  10
0005    JMP      1 => 0007
0006 => KSHORT   0  20
0007 => RET1     0   2

The test for the truthness of 10 has been eliminated.

Elimination of Unneeded Results

Logical operators compute an actual result, e.g. print("a" and 99) prints 99. But this result is not needed in a logical result context, such as an if statement. E.g. this is the bytecode for if "a" and 99 then end:

0001    RET0     0   1

The conditional as well as its result has been eliminated.

Unneeded results of calls are eliminated, too: e.g. the CALL* instructions are patched to not return any results.

Jump Folding

Unconditional jumps that lead to other jumps are folded into a jump to the final target. Similarly, closing upvalues at the end of a scope is often followed by a jump. The jump is then folded into the UCLO, which takes a branch target.

Template Tables

Lua can be used as data description language, which often involves creating tables that are initialized with many constants. Instead of allocating an empty table and generating lots of explicit stores, a template table is generated by the parser. The template table can be duplicated very fast with the TDUP instruction (duplication is necessary, because the resulting table may be modified).

Mixed constant and variable initializers are allowed, too. The template table is only created, if at least one constant initializer (constant key and constant value) is present in the table constructor expression. E.g. this is the bytecode for return { foo=1, bar=2, 1,2,x,4 }:

0001    TDUP     0   0
0002    GGET     1   1      ; "x"
0003    TSETB    1   0   3
0004    RET1     0   2

All of the constant initializers are merged into the template table. Only the store operation t[3] = x remains in the bytecode.

Instruction and Operand Specialization

Many bytecode instructions have specialized forms for commonly used operand types or number of operands. This reduces dispatch costs, since it eliminates (often unpredictable) branches inside of their implementation:

  • There are specialized loads of constants for each constant type (and range). E.g. KSTR loads string constants. Or KSHORT loads 16 bit signed integer constants, whereas KNUM loads all other numeric constants.
  • Comparison and test operators are specialized to the direction of their tests. E.g. there's both IST and ISF, which jump either on truthness or falseness. This saves a dispatch compared to using a single instruction plus an operand for the same purpose.
  • Comparisons for equality are specialized to the operand type, if one of the operands is a constant. E.g. ISEQN is an equality comparison with a numeric constant.
  • Arithmetic instructions are specialized for the common cases of having a numeric constant on the left-hand side or the right-hand side. E.g. ADDVN is an addition with a numeric constant on the right.
  • Setting an upvalue is specialized on the operand type. E.g. USETN is used to store a number constant.
  • Indexing operations are specialized into getters and setters and on the index operand type and range. E.g. TGETB is for loads with constant integer indices 0 to 255 or TSETS is for stores with a constant string key.
  • Calls are specialized into regular calls CALL and tailcalls CALLT.
  • Returns with 0 or 1 result use RET0 or RET1 instead of the generic RET.
  • Calls, returns and table initializers which consume multiple results are specialized. E.g. CALLM is a call that passes on multiple results as arguments.
  • Every type of loop uses their own specialized bytecode instructions. E.g. the *FOR* instructions are used for numeric 'for' loops.
  • Loops using pairs() use the specialized ISNEXT and ITERN instructions.
  • Function headers are specialized for fixed-arg vs. vararg functions and for Lua functions vs. C functions vs. fast functions. E.g. FUNCV is the function header for a vararg Lua function.

For more details see the description of the Bytecode instructions.

SSA IR Optimizations

TODO

TRACE: Region Selection

NLF: Natural Loop First

Hotspot Detection

Penalties and Blacklisting

Function Inlining

Tailcall Elimination

Recursive Unrolling

Control-Flow Optimizations

RECORD: Trace Recording

On-The-Fly SSA Transform

Upvalue Alias Analysis

Data-Flow Analysis

Sparse Snapshotting

Control-Flow Specialization

Hash Key Specialization

Fast Function Inlining

Bytecode Patching

FOLD: Fold Engine

Constant Folding

Algebraic Simplifications

Strength Reduction

Reassociation

CSE: Common Subexpression Elimination

ABC: Array Bounds Check Elimination

Scalar Evolution Analysis

NARROW: Narrowing

Narrowing of Integer Conversions

Narrowing of Arithmetic Operators

Predictive Narrowing of Induction Variables

Elimination of Overflow Checks

MEM: Memory Access Optimizations

AA: Alias Analysis

ESC: Escape Analysis

Index Reassociation

FWD: Load Forwarding, Store Forwarding

DSE: Dead-Store Elimination

Read-Modify-Write Simplifications

LOOP: Loop Optimizations

DCE: Dead-Code Elimination

Loop-Invariant Hoisting via Synthetic Unrolling

PHI Elimination

Instable Loop Unrolling

SPLIT: Instruction Splitting

FFI64: 64 bit Instruction Splitting

SOFTFP: FP Value Splitting and Soft-Float Calls

SINK: Allocation Sinking and Store Sinking

Allocation sinking and store sinking try to eliminate temporary allocations in the fast paths of the code, if possible.

Details can be found here: Allocation Sinking Optimization

ASM: Assembler Backend

Reverse Code Generation

DCE: On-The-Fly Dead-Code Elimination

Linear-Scan Register Allocation

Register Hinting

Register Renaming

Register Coalescing

PHI Register Shuffling

Rematerialization

Constant Synthesizing

Spill and Restore Scheduling

Loop Inversion

Loop Alignment

GC Step Elimination

FUSE: Operand Fusion

Instruction Specialization

Instruction Combining

Machine Code Patching