not logged in | [Login]
This is a list of ports, features and other work items that are planned for LuaJIT and are still in need of a sponsor.
If you want to see any of these features in a future version of LuaJIT and are interested in sponsoring them, please read: http://luajit.org/sponsors.html -- Thank you!
Note: The main LuaJIT author (Mike Pall) is working on unrelated projects and cannot accept new sponsorships at this time. But other community members may be open to sponsorship offers -- please ask on the LuaJIT mailing list for any takers.
Please note that some of the items in this list are dependent on others, which would have to be implemented first.
LuaJIT 2.0 has already been ported to the major CPU architectures and some of their variants. It currently supports x86, x64, ARM (ARM 32 bit instruction set only), PPC (PPC32 only), PPC/e500 (no JIT) and MIPS (FPU required).
The following ports are missing and would be a) feasible and b) interesting to have:
The current ARM port of LuaJIT only supports the 32 bit ARM instruction set. The interpreter core is written with the 32 bit ARM instruction set in mind and the JIT backend can only emit 32 bit ARM machine code.
Some newer ARM devices, such as the Cortex-M family are popular in low-power and embedded environments, but only support the (16 bit) Thumb2 instruction set.
Many embedded devices use MIPS CPUs without a hardware FPU. A soft-float and dual-number port would allow them to use LuaJIT, e.g. for routers running OpenWRT.
The SH4 CPU family is popular in embedded applications and a port of LuaJIT would be welcome by the embedded developer community.
There are a couple more CPU architectures or architectural variants left that are not yet supported by LuaJIT. Whether a port makes sense or not, depends on several factors:
The last item is a bit subjective, of course. :-)
The garbage collector used by LuaJIT 2.x is essentially the same as the Lua 5.1 GC. The current garbage collector is relatively slow compared to implementations for other language runtimes. It's not competitive with top-of-the-line GCs, especially for large workloads.
The main feature planned for LuaJIT 3.0 is a complete redesign of the garbage collector from scratch: the new garbage collector will be an arena-based, quad-color incremental, generational, non-copying, high-speed, cache-optimized garbage collector.
Accesses to metatables and __index
tables with constant keys are
already specialized by the JIT compiler to use optimized hash lookups
(HREFK
). This is based on the assumption that individual objects don't
change their metatable (once assigned) and that neither the metatable
nor the __index
table are modified. This turns out to be true in
practice, but those assumptions still need to be checked at runtime,
which can become costly for OO-heavy programming.
Further specialization can be obtained by strictly relying on these assumptions and omitting the related checks in the generated code. In case any of the assumptions are broken (e.g. a metatable is written to), the previously generated code must be invalidated or flushed.
Different mechanisms for detecting broken assumptions and for invalidating the generated code should be evaluated.
This optimization works at the lowest implementation level for metatables in the VM. It should equally benefit any code that uses metatables, not just the typical frameworks that implement a class-based system on top of it.
Value-range propagation is an optimization for the JIT compiler: by propagating the possible ranges for a value, subsequent code may be optimized or conditionals may be eliminated. Constant propagation (already implemented) can be seen as a special case of this optimization.
E.g. if a number is known to be in the range 0 <= x < 256
(say it
originates from string.byte
), then a later mask operation bit.band(x,
255)
is redundant. Similarly, a subsequent test for x < 0
can be
eliminated.
Note that even though few programmers would explicitly write such a series of operations, this can easily happen after inlining of functions combined with constant propagation.
Producing good code for unbiased branches is a key problem for trace compilers. This is the main cause for "trace explosion" and bad performance with certain types of branchy code.
Hyperblock scheduling promises to solve this nicely at the price of a major redesign of the compiler: selected traces are woven together to a single hyper-trace. This would also pave the way for emitting predicated instructions, which benefits some CPUs (e.g. ARM) and is a prerequisite for efficient vectorization.
The integrated C parser of the FFI library currently doesn't support
#define
or other C pre-processor features. To support the full range
of C semantics, an integrated C pre-processor is needed.
This would provide a nice solution to the C re-declaration problem for FFI modules, too.
This is a low-level equivalent to GCC inline assembler: given a C function declaration and a machine code template, an intrinsic function (builtin) can be constructed and later called. This allows generating and executing arbitrary instructions supported by the target CPU. The JIT compiler inlines the intrinsic into the generated machine code for maximum performance.
Developers usually shouldn't need to write machine code templates themselves. Common libraries of intrinsics for different purposes should be provided or contributed by experts.
Currently, vector data types may be defined with the FFI, but you really can't do much with them. The goal of this project is to add full support for vector data types to the JIT compiler and the CPU-specific backends (if the target CPU has a vector extension).
A new "ffi.vec"
module declares standard vector types and attaches the
machine-specific SIMD intrinsics as (meta)methods.
Prerequisites for this project are allocation sinking, user-definable intrinsics and the new garbage collector.
More about the last two features can be read here: http://lua-users.org/lists/lua-l/2012-02/msg00207.html