Red

News

0.5.3: Faster compilation and extended vector! support

The main point of this minor release is to speed up compilation time by introducing a new way for the compiler to store Red values required for constructing the environment during the runtime library startup.

Introducing Redbin

Red already provides two text-oriented serialization formats, following the base Rebol principles. Here are the available serialization formats now in Red with some pros/cons:

  • MOLD format
  1. provides a default readable text format, very close to the source code version
  2. cannot properly encode many values
  • MOLD/ALL format
  1. can encode series offsets
  2. some values with literal forms that rely on words that can be natively encoded (none, true/false, objects, …)
  3. human-readable, but not always nice-looking
  • Redbin format
  1. can encode any value accurately
  2. supports words binding
  3. can encode contexts efficiently
  4. supports cycles in blocks
  5. can encoded name/value pairs in any context
  6. extremely fast loading time
  7. very small storage space used when compressed
  8. non human-readable

So far, the existing environment source code (mostly block values) was converted to pure Red/System construction code which was pretty simple and straightforward to implement, but was generating thousands of extra lines of code, slowing down the native compilation process. The right solution for that was to introduce a new binary serialization format for Red values called Redbin (very inspired by Carl’s REBin proposal).

Redbin’s specification focuses on optimizing the loading time of encoded values, by making their stored representation very close to their memory representation, bypassing the parsing and validation stages. Moreover, the Redbin payload is compressed using the Crush algorithm (that Qtxie ported to Red/System), which features one of the fastest decompressors around while having a general compression ratio very close to the deflate algorithm (but compression speed is about an order of magnitude slower). This fits perfectly the needs for our Redbin use-case.

So the gains compared to pre-0.5.3 version are:

  • compilation time of empty Red program is ~40% faster!
  • generated executable of empty Red program is about 100KB smaller (278KB only on Windows now).
  • faster startup time, as the Redbin decoding process is much faster than the previous Red-stack-oriented construction approach.

Those benefits also extend to user code, your static series will be saved in Redbin format as well.

Redbin format is currently emitted by the compiler and decoded by the Red runtime, but there is no encoder yet in the runtime that would allow user code to emit Redbin format. We will provide that support in a future version, it is not high priority for now. A “compact” version of the encoding format will also be added, so that Redbin can also be a good choice for remote data exchange.

Compilation from Rebol console

For those using Red toolchain from Rebol2 console, a new rc function is introduced to avoid reloading the toolchain on each run. Typical session looks like this:

>> do %red.r
>> rc "-c tests\demo.red"

-=== Red Compiler 0.5.3 ===-

Compiling /C/Dev/Red/tests/demo.red ...
...compilation time : 416 ms

Compiling to native code...
Script: "Red/System PE/COFF format emitter" (none)
...compilation time : 12022 ms
...linking time     : 646 ms
...output file size : 284160 bytes
...output file      : C:\Dev\Red\demo.exe

>> call/output "demo.exe" s: make string! 10'000
== 0
>> print s

  RedRed              d
  d     d             e
  e     e             R
  R     R   edR    dR d
  d     d  d   R  R  Re
  edRedR   e   d  d   R
  R   e    RedR   e   d
  d    e   d      R   e
  e    R   e   d  d  dR
  R     R   edR    dR d

Collation tables

Since 0.5.2, Red provides collation tables for more accurate case folding support. Those tables can now be accessed by users using these paths:

system/locale/collation/upper-to-lower
system/locale/collation/lower-to-upper

Each of these tables is a vector of char! values which can be freely modified and extended by users in order to cope with some specific local rules for case folding. For example, in French language, the uppercase of letter é can be E or É. There is a divide among French people about which one should be used and in some cases, it can just be a typographical constraint. By default, Red will uppercase é as É, but this can be easily changed if required, here is how:

uppercase "éléphant"
== "ÉLÉPHANT"

table: system/locale/collation/lower-to-upper
foreach [lower upper] "àAéEèEêEôOûUùUîIçC" [table/:lower: upper]

uppercase "éléphant"
== "ELEPHANT"

Extended Vector! datatype

Vector! datatype now supports more actions and can store more datatypes with different bit-sizes. For integer! and char! values, you can store them as 8, 16 or 32 bits values. For float!, it is 32 or 64 bits. Several syntactic forms are accepted for creating a vector:

make vector! <slots>
make vector! [<values>]
make vector! [<type> <bit-size> [<values>]]
make vector! [<type> <bit-size> <slots> [<values>]]

<slots>    : number of slots to preallocate (32-bit slots by default)
<values>   : sequence of values of same datatype
<type>     : name of accepted datatype: integer! | char! | float!
<bit-size> : 8 | 16 | 32 for integer! and char!, 32 | 64 for float!

The type of the vector elements can be inferred from the provided values, so it can be omitted (unless you need to force a bit-size different from the values default one). If a value with a bit-size greater than the vector elements one, is inserted in the vector, it will be truncated to the bit-size of the vector.

For example, creating a vector that contains 1000 32-bit integer values:

make vector! 1000

Or if you want to specify the bit-size of the vector element:

make vector! [char! 16 1000]
make vector! [float! 64 1000]

You can also initialize a vector from a block as below:

make vector! [1.1 2.2 3.3 4.4]

Again you can also specify the bit-size of the vector element:

make vector! [integer! 8 [1 2 3 4]]

For integer! and char! vectors, you can use all math and bitwise operators now.

x: make vector! [1 2 3 4]
y: make vector! [2 3 4 5]
x + y
== make vector! [3 5 7 9]

In case of different bit-sizes, the resulting vector will be using the highest bit-size. If a math operation is producing a result that does not fit the bit-size, the result is currently truncated to the bit-size (using a AND operation). Ability to read and change the bit-size of a vector will be added in future releases.

The following actions are added to vector! datatype: clear, copy, poke, remove, reverse, take, sort, find, select, add, subtract, multiply, divide, remainder, and, or, xor.

The vector! implementation is not yet final, some of its actions will get optimized for better performances and, in future, rely on SIMD for even faster operations. For multidimensional support, it will be implemented as a new matrix! datatype in the near future, inheriting from vector!, so the additional code required will be kept minimal.

Bugfixing

This was a short-term release, but we managed to fix a few bugs anyway.

What’s next

Another minor release will follow with many runtime library additions and new toolchain improvements. See the planned features for 0.5.4 on our Trello board.

The 0.6.0 release will also most probably be split in two milestones, one for GUI and another for Android support.

In the meantime, enjoy this new release! :-)

0.5.2: Case folding and hash! support

This is minor release mainly motivated by the need to fix some annoying issues and regressions we have encountered in the last release:

  • the help function was displaying an error when used with no arguments, preventing newcomers from seeing the general help information
  • the console pre-compilation issue with timezones was back.

Some significant new features managed to sneak into this release too, along with some bugfixes.

Case folding

Red now provides uppercase and lowercase natives and more generally, better support for Unicode-aware case folding. Red runtime library contains now a general one-to-one mapping table for case folding that should cover most user needs.

red>> uppercase "hello"
== "HELLO"
red>> uppercase/part "hello" 1
== "Hello"
red>> uppercase "français"
== "FRANÇAIS"
red>> uppercase "éléphant"
== "ÉLÉPHANT"
red>> lowercase "CameL"
== "camel"

This applies also to words, so now case insensitivity is Unicode-aware in Red:

red>> É: 123
== 123
red>> é
== 123
red>> "éléphant" = "ÉLÉPHANT"
== true
red>> "éléphant" == "ÉLÉPHANT"
== false

For special cases, we will expose, in a future release, the collation table we use internally, so that anyone can provide a customized version that is a better fit for some local special rules or usages. For example, some lower case characters (such as “ß”) actually map to two or more upper case code points (“SS” in this case). So in Red, by default, you will get:

red>> lowercase "ß"
== ß
red>> uppercase "ß"
== ß

You can read more about our plans for full Unicode support on the wiki.

Hash datatype

The new hash! datatype works exactly the same way as in Rebol2. It provides a block-like interface but with fast lookups for most values (block series can be stored in hash! too, but they will not be hashed, so no faster access). It is a very flexible container for any kind of hashed tables (not only associative arrays) while keeping the handy navigational abilities of blocks. The underlying hashing function is a custom implementation of the MurmurHash3 algorithm. Some usage examples:

red>> list: make hash! [a 123 "hello" b c 789]
== make hash! [a 123 "hello" b c 789]
red>> list/c
== 789
red>> find list 'b
== make hash! [b c 789]
red>> dict: make hash! [a 123 b 456 c 789]
== make hash! [a 123 b 456 c 789]
red>> select dict 'c
== 789
red>> dict: make hash! [2 123 4 456 6 2 8 789]
== make hash! [2 123 4 456 6 2 8 789]
red>> select/skip dict 2 2
== 123

A map! datatype (strictly associative array) should also be provided in the next release, though, we are still investigating some of its features and use-case scenarios before deciding to release it officially.

Good news also about our Mac build server, a new one was kindly provided by Will (thanks a lot for that).

Our next release should mainly feature the Redbin format support for the Red compiler, providing much faster compilation times and reduced generated binaries.

Enjoy! :-)

0.4.2: Unicode console and FreeBSD support

This long awaited new release is now available. As I have been travelling a lot in the last months, this release has been delayed much more than I wanted. Anyway, we managed to achieve a really big amount of work, as shown by the 500+ commits since previous release and the 75 fixes over 210 new tickets opened. As usual, we strive to keep the number of opened tickets (especially bug reports) as low as possible, achieving 97.5% of closed tickets out of a total of 794 tickets so far! We really do care about bug reports.

New runtime lexer

The first runtime lexer (wrapped by load function) was implemented a year ago, as a quick hack for the console addition to Red. It was coded in Red/System and supported ASCII inputs only. It was not meant to stay more than a few weeks, but as often in the software world, temporary code lifespan exceeds by far the wishes of the author. The addition of Parse dialect in previous release has opened the possibility of rewriting the Red runtime lexer using the Unicode-aware parse function. It turned out to be a great design choice and opens even more interesting future options like I/O streaming support (when parse will support it) or dynamically extending the lexical rules (when loading custom datatypes for example).

Improved console

The new runtime lexer is now powering the Red console, so we finally have proper Unicode input support!

Image

A help system has also been provided, including the following functions: help, what and source. Try them from the console!

From the console code, the line editing features have been extracted in a different source file that can be now included in your Red programs when you need user input support. For that, two new functions have been provided (working like in Rebol): input and ask.

Moreover, a new branch was started in order to provide cross-platform line editing capabilities without the burden of external dependencies that have proved to be problematic and limited. The new vt100 version should work fine, but it is unfinished. Contributors with deep terminal coding experience are welcome to help improve the current code. We are aiming at a cross-platform console engine that could be used both in CLI and GUI apps.

Additional functions

  • New actions: absolute, remainder, complement, power, odd’, even’, and, or, xor, reverse
  • New natives: complement’, positive’, negative’, min, max, shift, to-hex
  • New operators: <<, >>, >>>

A new option was added to the system function: system/interpreted’, that will return true if the code is run by the interpreter (remember that do invokes the interpreter until we get JIT-compilation abilities).

Parse and load have been extended to accept a /part refinement.

Infix operators can now be created from existing functions or routines.

A first call function implementation has been contributed by Bruno Anselme with support for both Red and Red/System.

FreeBSD support

Yes, we got it now! :-) All thanks to Richard Nyberg who kindly provided us with the low-level patches required to make Red work on FreeBSD.

Red/System changes

The Red/System lexer used to be simply the load native from Rebol, which was a great way to speed up the development at the beginning, but was also limitating the syntax to only what Rebol2 accepts. Now the Red/System lexer uses the same code as the Red lexer (the compiler version, not the runtime one), freeing the Red/System syntax from the limitations and making it truly a dialect of Red!

Literal arrays support has been added also in order to facilitate initialization of arrays of value (until we get a first class array! datatype).

CPU registers read/write access has been added. It will be extended in the future to support special registers (like status flags register).

The maximum number of function local variables supported by Red/System was limited to 16, this was limitating also the number of local words that could be used in a Red function. This limitation has now been raised much higher, at least 512 local variables are now allowed.

Work in progress…

Object support is already present in this release, but is it not official yet, as it is supported by the interpreter only and not the compiler. Expect quick progress on this front.

The Android GUI support is also under heavy work in the android branch. In order to implement a proper GUI API, the implementation of a VID-like dialect has started, with Android as first back-end. Windows support should follow shortly, then Linux (most probably using GTK+) and finally MacOSX (once we implement the objective-c bridge).

Gear second!

I am not made of rubber, but I can go gear second too! ;-) You may have not noticed, but the project is rapidly growing up in the last months. It is moving faster and on a larger scale as more contributors are joining. We also get better organized. This is the github stats for just this month:

Image

The most important power-up we got was the addition of Xie Qingtian (aka qtxie) to the Red core team. Xie is an amazingly skilled young programmer from China, who is contributing to Red since more than a year now. But the thing is that he is working full time on Red project now, thanks to the kind sponsoring of CSDN and his CEO, Jiang Tao. Xie is the one who implemented all the new functions listed above and in a short amount of time! So consider that from now on, Red will be advancing twice faster thanks to him! ;-)

In order to organize better the work on Red, we are now using extensively Trello as our main task manager tool. The Red tasks board contains three main lists:

  • “Work in progress”: for features we are working on.
  • “Road to 1.0”: lists the required features for 1.0 version.
  • “Milestones”: helps us organize upcoming releases.

Last but not least, the number of visitors on this site and the github repo has, at least, doubled since new year, thanks to an article on CSDN about Red, our Google Summer of Code campaign and the successful StackOverflow ad campaign (finished earlier this month) run by HostileFork, that brought us more than 10k new visitors who clicked the ad, making it the most clicked ad on SO since the new year! The ad is still visible here.

Big thank to all the people that have contributed to this (big) release. Enjoy it! :-)

Plan for Unicode support

Red is growing up fast, even if just born two weeks ago! It is time we implement basic string support so we can do our first, real, hello-word. ;-)

Red strings will natively support Unicode. In order to achieve that in an efficient and cross-platform way, we need a good plan. Here is the list of Unicode native formats used by our main target platforms API:

Windows       : UTF-16
Linux         : UTF-8
MacOSX/Cocoa  : UTF-16
MacOSX/Darwin : UTF-8
Java          : UTF-16
.Net          : UTF-16
Javascript    : UTF-8
Syllable      : UTF-8

All these formats are variable-width encodings, requiring any indexed access to pay the cost of walking through the string.

Fortunately, there are also fixed-width Unicode encodings that can be used to give us back constant time for indexed accesses. So, in order to make it the most space-efficient, Red strings will internally support only these encoding formats:

Latin-1 (1 byte/codepoint)
UCS-2   (2 bytes/codepoint)
UCS-4   (4 bytes/codepoint)

This is not something new, at least Python 3.3 does it in the same way.

Additionally, UTF-8 and UTF-16 codecs will be supported, in order to deal with I/O accesses on host platforms.

Red will use UTF-8 for exchanging strings with outer world by default, except when accessing a UTF-16 API is necessary. Conversion for input and output strings will be done on-the-fly between one of the internal representation and UTF-8/UTF-16. When reading an input string, Red will select the most space-efficient internal format depending on highest codepoint in the input string. Also users should be able to force the encoding of a string to a given internal format, when possible.

So far, this is the plan for additing Unicode to Red, a prototype implementation will be done quickly, so we can fine-tune it if required.

Comments and suggestions are welcome.

Posts:

Tags: