Contrary to the hype
surrounding Java and
MONO
, bytecode compilation is hardly a
new thing. It dates back to the days of BCPL
and Pascal,
perhaps further. Mono development
platform
The general idea is that you
take code written in some high level language, and
rather than compiling it into "native" code for a
specific hardware architecture, you compile it into a
sort of "virtual assembly language," the instruction
set for some sort of generic processor. This has quite
a number of merits:
-
The code becomes
somewhat more 'opaque', which is good for those
that want to distribute proprietary software
written in scripting languages like Perl
or Python
;
-
Code is parsed by the
"bytecompiling"
process and is transformed into some form that
may be read in quickly without a need for
complex parsing;
-
By removing whitespace
and the likes, there is sometimes a savings of
space as compared to the source code form (this
is typical with
ELisp code).
More importantly, there
is almost always a huge
savings of space as compared to compiling to
machine code.
For example, the
calendrica code compiles in various
forms to the following sizes:
Table 1.
Compiling calendrica.lisp
| File |
Form |
Size
(bytes) |
| calendrica.lisp |
Source
Code |
170347 |
| calendrica.x86f |
CMUCL
Machine Code |
472649 |
| calendrica.lbytef |
CMUCL
Bytecompiled |
87660 |
| calendrica.fas |
CLISP
Bytecode |
190873 |
| calendrica.lbytef.gz |
CMUCL
Bytecompiled,
compressed |
34941 |
| calendrica.fas.gz |
CLISP
Bytecode,
compressed |
30290 |
The critical comparison
here is that the bytecoded forms are a whole
lot smaller than the roughly 472K of
calendrica.x86f.
It is far more
difficult to measure this, but bytecode is also
likely to be stored more compactly in memory
than machine code. This is one of the purposes
of the way
CMUCL combines native compilation with a
bytecode compiler: code that is executed a lot
will benefit from compilation to native code,
whilst by bytecode-compiling those parts of a
system that are seldom executed, substantial
memory savings are attained. The compactness
here comes from the fact that the "machine
language" is designed not for the
computer hardware ,
but rather for the application
.
-
Hand in hand with the
diminished size comes the combination of
convenience of implementation as well as
improved computational efficiency.
All three walk in
together as joint merits of designing a
"computational engine" specifically for the
application.
-
Consider that
if the application is intended to
process strings, it makes sense to have
strings as basic data types.
Parrot has "string" operations
length,
concat,
repeat,
tostring,
which work with strings
far more
conveniently than operators you
would get with "real machine
language." That convenience can
make it easier to write compact,
efficient code.
Expanding this
to "real machine code" would increase
the size of the code
considerably.
-
A simulated
"virtual machine" can be manipulated in
ways that would be prohibitively
complex to do on "bare hardware." For
instance, in the
Parrot system, it is easy enough to
save sets of registers by pushing a few
pointers onto a stack. On "bare
hardware," the equivalent behaviour
requires pushing a whole of registers
into memory locations. This has the
unexpected result that bytecoding can,
here and there, actually be
faster than
coding to bare
hardware.
Bytecode
machines have traditionally been
stack-oriented machines, where objects
would be drawn in and out of memory
onto a stack where they would then be
processed.
The
Parrot virtual machine is a little
different, having a register
architecture with four sets of 32
registers for four data types of
integers,
floats,
strings, and
Parrot Magic
Cookies. They figure that this
will lead to less stack
thrashing.
-
It is
convenient to create operations that do
extremely complex
processing.
Such operations
will provide a compact representation
for something that is complex, which
reduces the size of a program; they
also substantially improve performance
by allowing a lot of work to be done
within the optimized code of the
"virtual machine simulator."
The classic
example of an arguably
mistaken
example of this is in the
CRC
operation on the old VAX
architecture. Calculating CRC
checksums and evaluating
polynomials are wonderful
examples of "extremely complex
processing." Rather a lot of
microcode silicon was likely
consumed on these operations, and
few compilers made use of them.
At least not the
C compiler! As a result of that,
code implemented in C is unlikely to
use these operations, such as popular
bytecoded language
interpreters!
In an
application where you expect to
calculate a lot of polynomials, a
POLY
operator will certainly be of great
value, as would, very likely, a whole
set of matrix math
operators.
CLISP is known for having unusually
good performance when processing
BIGNUMs
(quasi-infinite precision integers).
Other
Common Lisp implementations tend to
beat its pants off when working with
small integers when they can render
code into native 32 bit arithmetic
operations, as you might find with
crypto applications, but once you
cross the line to the
BIGNUM,
all the implementations wind up
invoking function calls, and
behave little different from a
bytecode interpreter. CLISP has an
unusually good BIGNUM library, and
so works better than many others
in this area of
strength.
As for the
CRC
function on the VAX being a "mistake,"
it's a mistake when it consumes silicon
on the CPU that would have better been
used for something else, and then
remains unused when your favorite
compilers don't use it. The same is not
true for rarely-used bytecode
instructions. If there are 160000 gates
on a CPU that aren't being used, that
feels wasteful. If there is 16K of code
in the bytecode interpreter that never
gets used, and perhaps never even gets
paged into memory, the waste is nowhere
near as painful.
In the hardware
world, RISC may have become "king," in
that it allows silicon to be devoted to
having more registers and in improving
the ability to execute code in
parallel. In a bytecode interpreter,
CISC is virtually always a
win.
There have been some rather
hysterical reports and theories about the
relationship between MONO, GNOME
and Microsoft. Many quite wild, with rather
incoherent theories as to why someone would have
thought it sensible to implement MONO.
Contrary to some of the
wild theories floating out on
Slashdot, the reasoning has little to do with
"using Microsoft code," or Microsoft Passport
authentication, or anything else of the
sort.
The real reasoning has to
do with language. Microsoft is
implementing all sorts of things as "part of .NET;"
the parts MONO is
looking at are:
-
A dynamic
language
The big name
Ximian project is the
email-and-stuff application
Evolution
.
The code for it is
written in C,
and apparently whopping huge portions of it
consist of
memory management code, which, in C,
must be done quite manually.
Using a more
dynamic language offering
garbage collection allows the ability
to not bother writing hordes of
malloc()
and free() calls, which
would allow an application like
Evolution
to be both smaller and more easily
and quickly written.
Java offers
garbage collection, and so
resembles an
answer in this regard. So also would
languages such as Lisp,
Smalltalk,
Eiffel, and
Modula3.
-
A bytecoded
(perhaps JIT-compiled) platform to provide
some independence of platform.
This also would
disconnect application code somewhat from
the deep details of the many C-based
libraries of GNOME. Apparently the
not-always-organized growth of libraries in
GNOME has led to it becoming somewhat
difficult to make concurrent use of many of
the services offered.
Again, Java offers
a "JVM." A number of other languages offer
language-specific bytecoding schemes that
somewhat parallel this.
-
Language- and
platform-independence
One of the
important characteristics of the
GNOME project is that it intends to be
relatively agnostic about what languages
are used (in contrast with the somewhat
C++-partisan KDE
and
Objective C-partisan
GNUStep ).
The various "
bytecode execution machines" that are
presently available are generally
not terribly friendly
to the use of multiple languages. JVM is
for Java, for instance.
There is some
"never-accomplished Holy Grail" to this;
witness
UNCOL.
In effect, MONO represents something rather
like the "Java platform," except that it is
specifically intended to be language
neutral.
Here are some links to
interviews and commentary from sundry GNOME folk
about what they're about:
|
|
|