Open Sponsorships

This is a list of ports, features and other work items that are planned for LuaJIT and are still in need of a sponsor.

If you want to see any of these features in a future version of LuaJIT and are interested in sponsoring them, please read: http://luajit.org/sponsors.html -- Thank you!

Note: The main LuaJIT author (Mike Pall) is working on unrelated projects and cannot accept new sponsorships at this time. But other community members may be open to sponsorship offers -- please ask on the LuaJIT mailing list for any takers.

Please note that some of the items in this list are dependent on others, which would have to be implemented first.

Ports to More CPU Architectures

LuaJIT 2.0 has already been ported to the major CPU architectures and some of their variants. It currently supports x86, x64, ARM (ARM 32 bit instruction set only), PPC (PPC32 only), PPC/e500 (no JIT) and MIPS (FPU required).

The following ports are missing and would be a) feasible and b) interesting to have:

ARM Thumb2 Port

The current ARM port of LuaJIT only supports the 32 bit ARM instruction set. The interpreter core is written with the 32 bit ARM instruction set in mind and the JIT backend can only emit 32 bit ARM machine code.

Some newer ARM devices, such as the Cortex-M family are popular in low-power and embedded environments, but only support the (16 bit) Thumb2 instruction set.

MIPS Soft-Float and Dual-Number Port

Many embedded devices use MIPS CPUs without a hardware FPU. A soft-float and dual-number port would allow them to use LuaJIT, e.g. for routers running OpenWRT.

SH4 Port

The SH4 CPU family is popular in embedded applications and a port of LuaJIT would be welcome by the embedded developer community.

Other Ports

There are a couple more CPU architectures or architectural variants left that are not yet supported by LuaJIT. Whether a port makes sense or not, depends on several factors:

The CPU/ABI must have a memory model that's compatible with LuaJIT.
The architecture must have a reasonable market share.
The architecture is not dying anytime soon.

The last item is a bit subjective, of course. :-)

New Garbage Collector for LuaJIT 3.0

The garbage collector used by LuaJIT 2.x is essentially the same as the Lua 5.1 GC. The current garbage collector is relatively slow compared to implementations for other language runtimes. It's not competitive with top-of-the-line GCs, especially for large workloads.

The main feature planned for LuaJIT 3.0 is a complete redesign of the garbage collector from scratch: the new garbage collector will be an arena-based, quad-color incremental, generational, non-copying, high-speed, cache-optimized garbage collector.

Metatable/__index Specialization

Accesses to metatables and __index tables with constant keys are already specialized by the JIT compiler to use optimized hash lookups (HREFK). This is based on the assumption that individual objects don't change their metatable (once assigned) and that neither the metatable nor the __index table are modified. This turns out to be true in practice, but those assumptions still need to be checked at runtime, which can become costly for OO-heavy programming.

Further specialization can be obtained by strictly relying on these assumptions and omitting the related checks in the generated code. In case any of the assumptions are broken (e.g. a metatable is written to), the previously generated code must be invalidated or flushed.

Different mechanisms for detecting broken assumptions and for invalidating the generated code should be evaluated.

This optimization works at the lowest implementation level for metatables in the VM. It should equally benefit any code that uses metatables, not just the typical frameworks that implement a class-based system on top of it.

Value-Range Propagation (VRP)

Value-range propagation is an optimization for the JIT compiler: by propagating the possible ranges for a value, subsequent code may be optimized or conditionals may be eliminated. Constant propagation (already implemented) can be seen as a special case of this optimization.

E.g. if a number is known to be in the range 0 <= x < 256 (say it originates from string.byte), then a later mask operation bit.band(x, 255) is redundant. Similarly, a subsequent test for x < 0 can be eliminated.

Note that even though few programmers would explicitly write such a series of operations, this can easily happen after inlining of functions combined with constant propagation.

Hyperblock Scheduling

Producing good code for unbiased branches is a key problem for trace compilers. This is the main cause for "trace explosion" and bad performance with certain types of branchy code.

Hyperblock scheduling promises to solve this nicely at the price of a major redesign of the compiler: selected traces are woven together to a single hyper-trace. This would also pave the way for emitting predicated instructions, which benefits some CPUs (e.g. ARM) and is a prerequisite for efficient vectorization.

FFI C Pre-Processor

The integrated C parser of the FFI library currently doesn't support #define or other C pre-processor features. To support the full range of C semantics, an integrated C pre-processor is needed.

This would provide a nice solution to the C re-declaration problem for FFI modules, too.

User-Definable Intrinsics for the FFI

This is a low-level equivalent to GCC inline assembler: given a C function declaration and a machine code template, an intrinsic function (builtin) can be constructed and later called. This allows generating and executing arbitrary instructions supported by the target CPU. The JIT compiler inlines the intrinsic into the generated machine code for maximum performance.

Developers usually shouldn't need to write machine code templates themselves. Common libraries of intrinsics for different purposes should be provided or contributed by experts.

Vector/SIMD Data Type Support for the FFI

Currently, vector data types may be defined with the FFI, but you really can't do much with them. The goal of this project is to add full support for vector data types to the JIT compiler and the CPU-specific backends (if the target CPU has a vector extension).

A new "ffi.vec" module declares standard vector types and attaches the machine-specific SIMD intrinsics as (meta)methods.

Prerequisites for this project are allocation sinking, user-definable intrinsics and the new garbage collector.

More about the last two features can be read here: http://lua-users.org/lists/lua-l/2012-02/msg00207.html

The LuaJIT Wiki

Open Sponsorships

Open Sponsorships

Ports to More CPU Architectures

ARM Thumb2 Port

MIPS Soft-Float and Dual-Number Port

SH4 Port

Other Ports

New Garbage Collector for LuaJIT 3.0

Metatable/__index Specialization

Value-Range Propagation (VRP)

Hyperblock Scheduling

FFI C Pre-Processor

User-Definable Intrinsics for the FFI

Vector/SIMD Data Type Support for the FFI