Virtual machine series: Lightweight embeddable languages

Posted by: Subreption LLC 1 year, 2 months ago (0 comments)

Soon we will be publishing several articles about open source embeddable virtual machines available today. Among them we will be talking on LUA, Forth, Pawn (originally Small), Squirrel, specialty implementations of Python such as PyMite, Io, NekoVM, Falcon and Parrot. Depending on time availability it might take some time until all of these languages are covered. Mileage will also vary regarding detail and depth of evaluation.

The rationale behind these articles is the lack of summarized, objective information about the implementations of embedded languages, the differences between them at implementation level, limitations and features, et cetera. If you are looking for such information to make a decision about which VM or language suits you best, these articles might be of help to have a reasonably solid base for your evaluation process.

Choosing a virtual machine must involve a thorough analysis of the requirements for the language, expectations on the language, performance, extensibility, size, features, etc. We will be focusing on the following questions:

  • How much does the compiled code weigh?
    • Can it be embedded easily?
    • Is the license restrictive for commercial development?
  • Is the VMseparate or independent from lexers, parsers and compilers?
    • Can the VM be used easily with a different or new compiler backend?
    • Is it possible to adapt an existent compiler to produce bytecode for that VM without unreasonable hindrances?
  • What kind of data types are supported?
    • Can it handlemulti-byte strings?
    • Can it handle64-bitdata types?
    • Does it impose size limits on any data type?
    • If it implements bytecode or support for a foreign or widely used language, does it support the full set or specification?
  • What kind of memory management is used?
    • Is the VMdeterministic? (Bytecode always takes the same time to run)
    • Does it use a garbage collector?
      • What kind of GC?
      • Does it support emergency collection?
      • Does it perform well?
      • Can it be changed for a different GC or adapted to use different memory management?
    • Is it truly multi-platform?
    • Does it handle leaks or provide information about users and callers?
    • Does it perform well under stress, is it intended to be used in a single thread?
    • In case of JIT, does it depend on memory being writable and executable at the same time? (or changing permissions to achieve runtime code execution as it is common to all JIT engines).
      • In properly hardened environments, this might be forbidden.
      • Memory might not be allowed to become executable after being writable (as in PaX MPROTECT-style protection).
  • Is the bytecode a solid standard or reasonably stable across different versions?
    • Does it rely on compression to provide compact chunks?
    • Is there any duplication of variables or data?
    • Can the VM handle opcode updates and version information properly?
    • Is it cross-platform or cross-architecture?
      • Is it possible to compile bytecode on a host fior executing it on another machine with entirely different architecture (endianness, default integer or floating point size...).
    • Does the VM implement any kind of verification or sandboxing? Can jump instructions be used to subvert the execution flow, for example, to execute arbitrary bytecode carefully lined out in the process heap or stack?
    • Is there any symbol translation, efficient hashing or approach to access type and object information on runtime?
    • Is it dynamically typed or strongly typed?
    • Does the compiler need access to all the referenced symbols or weak/lazy binding (or linking) is allowed?
  • Is it thread-safe? How does the VM account for multi-threading, synchronization, atomic operationsand other areas of potential conflict? Is it designed to be executed in a single instance?
    •  What threading API or approach is used?
    • Is it portable or threads are supported only for selected platforms?
  • Is it well maintained? Would the code make a reasonable developer cry his eyes out if he had to extend or maintain the code base? In other words, does it plain and simply 'suck'? Did the developer put significant effort on documentation?
Furthermore, the VM and compiler implementations reviewed will be all under one megabyte in size (with special attention to those under 300 kilobytes) and written in a low footprint language (ANSI C with inline assembly preferred).

Hopefully these articles will shed some light about an important component of the development of real time systems, video-games, embedded scripting and high performance prototyping. Meanwhile, you are welcome to contact us if you have any comments, benchmarks or existent work for any embedded language virtual machine available today as an open source project. Subscribe now and receive daily updates whenever a new article is published in this series.
Tags:

Currently unrated

Comments