Red

News

Red/System v0.2.2 released

This release is mainly a bugfix release that solves several old issues. It is also now correctly synchronized with all the current bindings done for Red/System. The main changes are:

  • Internal compiler refactoring of: expressions compilation, type casting and ANY/ALL support.
  • Greatly improved runtime error reporting: now it reports both source line number and source file name where the error occured. It works in debug mode only (-g command-line option).
  • Aliased struct names can now be tested separately in typed (RTTI) functions.
  • Callback function attribute removed. It is no more needed and any function can now be used as a callback. In addition a new cdecl attribute is now accepted to allow the switch to C calling convention, when passing a function as argument to an imported C function.
  • 21 issue reports closed.
  • More than 2000 new unit tests were added (mostly generated using scripts wrote by Peter WA Wood) for a total of now 8613 tests.

0.5.3: Faster compilation and extended vector! support

The main point of this minor release is to speed up compilation time by introducing a new way for the compiler to store Red values required for constructing the environment during the runtime library startup.

Introducing Redbin

Red already provides two text-oriented serialization formats, following the base Rebol principles. Here are the available serialization formats now in Red with some pros/cons:

  • MOLD format
  1. provides a default readable text format, very close to the source code version
  2. cannot properly encode many values
  • MOLD/ALL format
  1. can encode series offsets
  2. some values with literal forms that rely on words that can be natively encoded (none, true/false, objects, …)
  3. human-readable, but not always nice-looking
  • Redbin format
  1. can encode any value accurately
  2. supports words binding
  3. can encode contexts efficiently
  4. supports cycles in blocks
  5. can encoded name/value pairs in any context
  6. extremely fast loading time
  7. very small storage space used when compressed
  8. non human-readable

So far, the existing environment source code (mostly block values) was converted to pure Red/System construction code which was pretty simple and straightforward to implement, but was generating thousands of extra lines of code, slowing down the native compilation process. The right solution for that was to introduce a new binary serialization format for Red values called Redbin (very inspired by Carl’s REBin proposal).

Redbin’s specification focuses on optimizing the loading time of encoded values, by making their stored representation very close to their memory representation, bypassing the parsing and validation stages. Moreover, the Redbin payload is compressed using the Crush algorithm (that Qtxie ported to Red/System), which features one of the fastest decompressors around while having a general compression ratio very close to the deflate algorithm (but compression speed is about an order of magnitude slower). This fits perfectly the needs for our Redbin use-case.

So the gains compared to pre-0.5.3 version are:

  • compilation time of empty Red program is ~40% faster!
  • generated executable of empty Red program is about 100KB smaller (278KB only on Windows now).
  • faster startup time, as the Redbin decoding process is much faster than the previous Red-stack-oriented construction approach.

Those benefits also extend to user code, your static series will be saved in Redbin format as well.

Redbin format is currently emitted by the compiler and decoded by the Red runtime, but there is no encoder yet in the runtime that would allow user code to emit Redbin format. We will provide that support in a future version, it is not high priority for now. A “compact” version of the encoding format will also be added, so that Redbin can also be a good choice for remote data exchange.

Compilation from Rebol console

For those using Red toolchain from Rebol2 console, a new rc function is introduced to avoid reloading the toolchain on each run. Typical session looks like this:

>> do %red.r
>> rc "-c tests\demo.red"

-=== Red Compiler 0.5.3 ===-

Compiling /C/Dev/Red/tests/demo.red ...
...compilation time : 416 ms

Compiling to native code...
Script: "Red/System PE/COFF format emitter" (none)
...compilation time : 12022 ms
...linking time     : 646 ms
...output file size : 284160 bytes
...output file      : C:\Dev\Red\demo.exe

>> call/output "demo.exe" s: make string! 10'000
== 0
>> print s

  RedRed              d
  d     d             e
  e     e             R
  R     R   edR    dR d
  d     d  d   R  R  Re
  edRedR   e   d  d   R
  R   e    RedR   e   d
  d    e   d      R   e
  e    R   e   d  d  dR
  R     R   edR    dR d

Collation tables

Since 0.5.2, Red provides collation tables for more accurate case folding support. Those tables can now be accessed by users using these paths:

system/locale/collation/upper-to-lower
system/locale/collation/lower-to-upper

Each of these tables is a vector of char! values which can be freely modified and extended by users in order to cope with some specific local rules for case folding. For example, in French language, the uppercase of letter é can be E or É. There is a divide among French people about which one should be used and in some cases, it can just be a typographical constraint. By default, Red will uppercase é as É, but this can be easily changed if required, here is how:

uppercase "éléphant"
== "ÉLÉPHANT"

table: system/locale/collation/lower-to-upper
foreach [lower upper] "àAéEèEêEôOûUùUîIçC" [table/:lower: upper]

uppercase "éléphant"
== "ELEPHANT"

Extended Vector! datatype

Vector! datatype now supports more actions and can store more datatypes with different bit-sizes. For integer! and char! values, you can store them as 8, 16 or 32 bits values. For float!, it is 32 or 64 bits. Several syntactic forms are accepted for creating a vector:

make vector! <slots>
make vector! [<values>]
make vector! [<type> <bit-size> [<values>]]
make vector! [<type> <bit-size> <slots> [<values>]]

<slots>    : number of slots to preallocate (32-bit slots by default)
<values>   : sequence of values of same datatype
<type>     : name of accepted datatype: integer! | char! | float!
<bit-size> : 8 | 16 | 32 for integer! and char!, 32 | 64 for float!

The type of the vector elements can be inferred from the provided values, so it can be omitted (unless you need to force a bit-size different from the values default one). If a value with a bit-size greater than the vector elements one, is inserted in the vector, it will be truncated to the bit-size of the vector.

For example, creating a vector that contains 1000 32-bit integer values:

make vector! 1000

Or if you want to specify the bit-size of the vector element:

make vector! [char! 16 1000]
make vector! [float! 64 1000]

You can also initialize a vector from a block as below:

make vector! [1.1 2.2 3.3 4.4]

Again you can also specify the bit-size of the vector element:

make vector! [integer! 8 [1 2 3 4]]

For integer! and char! vectors, you can use all math and bitwise operators now.

x: make vector! [1 2 3 4]
y: make vector! [2 3 4 5]
x + y
== make vector! [3 5 7 9]

In case of different bit-sizes, the resulting vector will be using the highest bit-size. If a math operation is producing a result that does not fit the bit-size, the result is currently truncated to the bit-size (using a AND operation). Ability to read and change the bit-size of a vector will be added in future releases.

The following actions are added to vector! datatype: clear, copy, poke, remove, reverse, take, sort, find, select, add, subtract, multiply, divide, remainder, and, or, xor.

The vector! implementation is not yet final, some of its actions will get optimized for better performances and, in future, rely on SIMD for even faster operations. For multidimensional support, it will be implemented as a new matrix! datatype in the near future, inheriting from vector!, so the additional code required will be kept minimal.

Bugfixing

This was a short-term release, but we managed to fix a few bugs anyway.

What’s next

Another minor release will follow with many runtime library additions and new toolchain improvements. See the planned features for 0.5.4 on our Trello board.

The 0.6.0 release will also most probably be split in two milestones, one for GUI and another for Android support.

In the meantime, enjoy this new release! :-)

0.4.0: Red goes binary!

What’s that’!

As we are getting closer to the end of the alpha period, we are now moving to a more convenient way to use and distribute the Red toolchain. So far, you needed to download a Rebol interpreter and the sources from Github separately, and run it using, a bit verbose, command-lines. This is fine for developping Red with contributors that are interested in the inner workings of the toolchain, but for the end users, the goal has always been to provide a simpler and much more convenient way, like Rebol teached us in the past.

So, from now, you can get Red as a single binary (< 1 MB) from the new Download page. Yes, all of Red toolchain and runtime is packed in that small binary, even the REPL is built-in!

The Red repo landing page has been reworked to show Red usage in binary form, all previous command-line options are present, a new one (-c) has been introduced. Here is a short overview of the main options:

Launch the REPL:

$ red

Run a Red script directly (interpreted):

$ red <script>

Compile a script as executable with default name in working path:

$ red -c <script>

Compile a script as shared library with default name in working path:

$ red -dlib <script>

Compile a script as executable with alternative name:

$ red -o <new> <script>

Compile a script as executable with default name in other folder:

$ red -o <path/> <script>

Compile a script as executable with new name in other folder:

$ red  -o <path/new> <script>

Cross-compile to another platform:

$ red -t Windows <script>

Display a description of all possible options:

$ red -h

Notice that -c option is implied when -o or -t are used. It is designed to keep command-lines as simple and short as possible.

Moreover, for standalone Red/System programs, the Red binary will be able to compile them directly, no special option needed, they will be recognized automatically.

Thanks very much to Tamás Herman for helping with setting up the build farm and providing the Mac OSX machine, and thanks to the HackJam hackerspace group from Hong-Kong for the hosting!

Other changes

  • In addition to that new binary form, 17 issues have been fixed since the 0.3.3 release about a month ago (not counting regression tickets).
  • The work on objects support is half-done, objects are working fine with advanced semantics on the interpreter (see object branch), now the focus will be to support them at the Red compiler level.

What’s next’

As we are moving closer to the beta state, version numbers will increase faster, e.g., once objects will be done, the release will be the 0.5.0, while 0.6.0 will bring full I/O support. Between these major versions, smaller versions should be still possible, this means that the release cycle should accelerate with at least one release each month from now on. So, what you should expect in the next ones’

0.4.x

  • Simple I/O support: (just read, write and exists’ on files)
  • PARSE support
  • Pre-compiled runtime (much faster compilation times)

0.5.0

  • Full object support

0.5.x

  • VID-like cross-platform dialect binding to native widgets.
  • Mezzanine functions additions
  • Redbin (accurate Red values serialization in binary format)
  • Full errors management
  • Red-level exceptions handling

Enjoy!

Red/System compiler overview

Source Navigation

As requested by several users, I am giving a little more insights on the Red/System compiler inner workings and a map for navigating in the source code.

Current Red/System source tree:

red-system/
    %compiler.r        ; Main compiler code, loads everything else
    %emitter.r         ; Target code emitter abstract layer
    %linker.r          ; Format files loader
    %rsc.r             ; Compiler's front-end for standalone usage

formats/               ; Contains all supported executable formats
    %PE.r              ; Windows PE/COFF file format emitter
    %ELF.r             ; UNIX ELF file format emitter

library/               ; Third-party libraries

runtime/               ; Contains all runtime definitions
    %common.reds       ; Cross-platform definitions
    %win32.r           ; Windows-specific bindings
    %linux.r           ; Linux-specific bindings

targets/               ; Contains target execution unit code emitters
    %target-class.r    ; Base utility class for emitters
    %IA32.r            ; Intel IA32 code emitter

tests/                 ; Unit tests

Once the compiler code is loaded in memory, the objects hierarchy looks like:

system/words/          ; global REBOL context

system-dialect/        ; main object
    loader/            ; preprocessor object
        process        ; preprocessor entry point function
    compiler/          ; compiler object
        compile        ; compiler entry point function

emitter/               ; code emitter object
     compiler/         ; short-cut reference to compiler object
     target/           ; reference the target code emitter object
         compiler/     ; short-cut reference to compiler object

linker/                ; executable file emitter
    PE/                ; Windows PE/COFF format emitter object
    ELF/               ; UNIX ELF format emitter object

Note: the linker file formats are currently statically loaded, this will be probably changed to a dynamic loading model.

Compilation Phases

The compilation is a process that goes through several phases to transform a textual source code to an executable file. Here is an overview of the process:

1) Source loading

This is a preparatory phase that would convert the text source code to its memory representation (close to an AST). This is achieved in 3 steps:

  1. source is preprocessed in its text form to make the syntax REBOL-compatible
  2. source is converted to a tree of nested blocks using the REBOL’s LOAD function
  3. source is postprocessed to interpret some compiler directives (like #include and #define)

2) Compilation

The compiler walks through the source tree using a recursive descent parser. It attempts to match keywords and values with its internal rules and emits:

  • the corresponding machine code in the code buffer
  • the global variables and complex data (c-string! and struct! literals) in the data buffer

The internal entry point function for the compilation is compiler/comp-dialect. All the compiler/comp-* functions are used to recursively analyze the source code and each one matches a specific semantic rule (or set of rules) from the Red/System language specification.

The production of native code is direct, there is no intermediary representation, machine code is generated as soon as a language statement or expression is matched. This is the simplest way to do it, but code cannot be efficiently optimised without a proper Intermediate Representation (IR). When Red/System will be rewritten in Red, a simple IR will be introduced to enable the full range of possible code optimisations.

As you know, a Red/System program entry point is at the beginning of the source code. During the compilation, the source code in the global context is compiled first and all functions are collected and compiled after all global code. So the generated code is always organised the same way:

  • global code (including proper program finalization)
  • function 1
  • function 2

The results of the compilation process are:

  • a global symbol table
  • a machine code buffer
  • a global data buffer
  • a list of functions from external libraries (usually these are OS API mappings)

The compiler is able to process one or several source files this way before entering the linking phase.

3) Linking

The linking process goal is to create an executable file that could be loaded by the target platform. So the executable file needs to conform to the target ABI for that, like PE for Windows or ELF for Linux.

During the linking, the global symbol table is used to “patch” the code and data buffer (see linker/resolve-symbol-refs), inserting the final memory address for the pointed resources (variable, function, global data). The different parts to assemble are grouped into so-called “sections”, that can be themselves be grouped into “segments” (as, e.g., in ELF).

Finally, some headers describing the file and its sections/segments are inserted to complete the file. The file is then written down on disk, marking the end of the whole process.

Static linking of external libraries (*.lib, *.a,…) will be added at some point in the future (when the need for such feature will appear).

I hope this short description gives you a better picture on how Red/System compiler works, even if it is probably obvious for the most experienced readers. Feel free to ask for more in the comments, or better, on the Google Groups mailing-list.

Posts:

Tags: