Open Sponsorships

This is a list of ports, features and other work items that are planned for LuaJIT and are still in need of a sponsor.

If you want to see any of these features in a future version of LuaJIT and are interested in sponsoring them, please read: http://luajit.org/sponsors.html -- Thank you!

Please note that some of the items in this list are dependent on others, which would have to be implemented first.

Ports to More CPU Architectures

LuaJIT 2.0 has already been ported to the major CPU architectures and some of their variants. It currently supports x86, x64, ARM (ARM 32 bit instruction set only), PPC (PPC32 only), PPC/e500 (no JIT) and MIPS (FPU required).

The following ports are missing and would be a) feasible and b) interesting to have:

ARM Thumb2 Port

The current ARM port of LuaJIT only supports the 32 bit ARM instruction set. The interpreter core is written with the 32 bit ARM instruction set in mind and the JIT backend can only emit 32 bit ARM machine code.

Some newer ARM devices, such as the Cortex-M family are popular in low-power and embedded environments, but only support the (16 bit) Thumb2 instruction set.

ARMv8 aka AArch64 Port

ARM has introduced a new instruction set which will be used for upcoming CPUs. At the machine code level this is completely different from the current ARM 32 bit instruction set. To support ARMv8, the interpreter and the JIT compiler backend would need to be re-implemented.

MIPS Soft-Float and Dual-Number Port

Many embedded devices use MIPS CPUs without a hardware FPU. A soft-float and dual-number port would allow them to use LuaJIT, e.g. for routers running OpenWRT.

SH4 Port

The SH4 CPU family is popular in embedded applications and a port of LuaJIT would be welcome by the embedded developer community.

Other Ports

There a couple more CPU architectures or architectural variants left that are not yet supported by LuaJIT. Whether a port makes sense or not, depends on several factors:

The CPU/ABI must have a memory model that's compatible with LuaJIT.
The architecture must have a reasonable market share.
The architecture is not dying anytime soon.

The last item is a bit subjective, of course. :-)

New Garbage Collector for LuaJIT 3.0

The garbage collector used by LuaJIT 2.x is essentially the same as the Lua 5.1 GC. The current garbage collector is relatively slow compared to implementations for other language runtimes. It's not competitive with top-of-the-line GCs, especially for large workloads.

The main feature planned for LuaJIT 3.0 is a complete redesign of the garbage collector from scratch: the new garbage collector will be an arena-based, quad-color incremental, generational, non-copying, high-speed, cache-optimized garbage collector.

Metatable/__index Specialization

Accesses to metatables and __index tables with constant keys are already specialized by the JIT compiler to use optimized hash lookups (HREFK). This is based on the assumption that individual objects don't change their metatable (once assigned) and that neither the metatable nor the __index table are modified. This turns out to be true in practice, but those assumptions still need to be checked at runtime, which can become costly for OO-heavy programming.

Further specialization can be obtained by strictly relying on these assumptions and omitting the related checks in the generated code. In case any of the assumptions are broken (e.g. a metatable is written to), the previously generated code must be invalidated or flushed.

Different mechanisms for detecting broken assumptions and for invalidating the generated code should be evaluated.

This optimization works at the lowest implementation level for metatables in the VM. It should equally benefit any code that uses metatables, not just the typical frameworks that implement a class-based system on top of it.

Value-Range Propagation (VRP)

Value-range propagation is an optimization for the JIT compiler: by propagating the possible ranges for a value, subsequent code may be optimized or conditionals may be eliminated. Constant propagation (already implemented) can be seen as a special case of this optimization.

E.g. if a number is known to be in the range 0 <= x < 256 (say it originates from string.byte), then a later mask operation bit.band(x, 255) is redundant. Similarly, a subsequent test for x < 0 can be eliminated.

Note that even though few programmers would explicitly write such a series of operations, this can easily happen after inlining of functions combined with constant propagation.

Hyperblock Scheduling

Producing good code for unbiased branches is a key problem for trace compilers. This is the main cause for "trace explosion" and bad performance with certain types of branchy code.

Hyperblock scheduling promises to solve this nicely at the price of a major redesign of the compiler: selected traces are woven together to a single hyper-trace. This would also pave the way for emitting predicated instructions, which benefits some CPUs (e.g. ARM) and is a prerequisite for efficient vectorization.

FFI C Pre-Processor

The integrated C parser of the FFI library currently doesn't support #define or other C pre-processor features. To support the full range of C semantics, an integrated C pre-processor is needed.

This would provide a nice solution to the C re-declaration problem for FFI modules, too.

Partial C++ Support for the FFI

Full C++ support for the FFI is not feasible, due to the sheer complexity of the task: one would need to write more or less a complete C++ compiler.

However, a limited number of C++ features can certainly be supported. Of course, one could argue, anything but full support doesn't make sense. But you'll never know, unless you try ...

It would be an interesting task to evaluate what subset of C++ can be supported with reasonable effort or which C++ libraries can be successfully bound via the FFI. Basically: how far can C++ support go, how much effort would be needed and does it really pay off in practice?

Such a project should be split into the evaluation phase and an implementation phase, which implements the C++ subset, based on the prior evaluation.

User-Definable Intrinsics for the FFI

This is a low-level equivalent to GCC inline assembler: given a C function declaration and a machine code template, an intrinsic function (builtin) can be constructed and later called. This allows generating and executing arbitrary instructions supported by the target CPU. The JIT compiler inlines the intrinsic into the generated machine code for maximum performance.

Developers usually shouldn't need to write machine code templates themselves. Common libraries of intrinsics for different purposes should be provided or contributed by experts.

Vector/SIMD Data Type Support for the FFI

Currently, vector data types may be defined with the FFI, but you really can't do much with them. The goal of this project is to add full support for vector data types to the JIT compiler and the CPU-specific backends (if the target CPU has a vector extension).

A new "ffi.vec" module declares standard vector types and attaches the machine-specific SIMD intrinsics as (meta)methods.

Prerequisites for this project are allocation sinking, user-definable intrinsics and the new garbage collector.

More about the last two features can be read here: http://lua-users.org/lists/lua-l/2012-02/msg00207.html

The LuaJIT Wiki

Open Sponsorships

Open Sponsorships

Ports to More CPU Architectures

ARM Thumb2 Port

ARMv8 aka AArch64 Port

MIPS Soft-Float and Dual-Number Port

SH4 Port

Other Ports

New Garbage Collector for LuaJIT 3.0

Metatable/__index Specialization

Value-Range Propagation (VRP)

Hyperblock Scheduling

FFI C Pre-Processor

Partial C++ Support for the FFI

User-Definable Intrinsics for the FFI

Vector/SIMD Data Type Support for the FFI