Red

News

Native reactive spreadsheet in 17 LOC

After the release of our reactive framework a few days ago, we though it would be a good idea to implement the, often mentioned, spreadsheet model as a small demo, just to see how much it would take to do it in Red, with its current feature-set. Well, despite not having a grid component, it turns out that 17 LOC (of packed, but still readable code; down to 14 LOC and 1053 bytes if minified) is enough to make a spreadsheet-like demo with native widgets and realtime updating of dependent cells as-you-type! ;-)

Red [] L: charset "ABCDEFGHI" D: union N: charset "123456789" charset "0" 
repeat y 9 [repeat x 9 [col: either x = 1 [#" "][#"A" + (x - 2)]
  append p: [] set ref: (to word! rejoin [col y - 1]) make face! [size: 90x24
    type:    pick [text field] header': (y = 1) or (x = 1)
    offset:  -20x10 + as-pair ((x - 1) * size/x + 2) ((y - 1) * size/y + 1)
    text:    form case [y = 1 [col] x = 1 [y - 1] 'else [copy ""]]
    para:    make para! [align: pick [center right] header']
    extra:   object [name: form ref formula: old: none]
    actors:  context [on-create: on-unfocus: function [f e][f/color: none
      if rel: f/extra/old [react/unlink rel 'all]
      if #"=" = first f/extra/formula: copy text: copy f/text [parse remove text
          [any [p: L N not ["/" skip not N] insert p " " insert "/data "
          | L skip | p: some D opt [dot some D] insert p " " insert " " | skip]]
        f/text: rejoin [f/extra/name "/data: any [math/safe [" text {] "#UND"]}]
       if f/data [any [react f/extra/old: f/data do f/data]]]]
      on-focus: func [f e][f/text: any [f/extra/formula f/text] f/color: yello]
]]]] view make face! [type: 'window text: "PicoSheet" size: 840x250 pane: p]

You can copy/paste the above code into the Red console for Windows, using the latest toolchain build (950 KB), or, better, using this prebuilt console version (247 KB, requires Windows 7+). Yeah, we still count in KB. ;-)

Features:

  • 100% native widgets using our built-in GUI engine (no third-party libraries, Windows only for now, OSX and GTK are coming).
  • Support for arbitrary Excel-style formulas (=A1+2*C3).
  • Support for arbitrary Red code in formulas.
  • Realtime updating of dependent cells as you type.
  • While editing a formula, dependent cells will display #UND (for “undefined”).
  • If a formula is syntactically incorrect, #UND is displayed in the cell.
  • Code is packed to reduce the LOC number, but limited to 82 characters per line (could fit on 77 if indentation is removed).
  • It takes about 6 LOC to build the spreadsheet and 3 LOC to compile the formulas to Red expressions.
  • One expression per line (expressions can have nested expressions), Red’s header not counting as an expression for the purpose of this demo, nor the last expression at line 16 for setting the yellow color on focus, it is just there to make the animated captures easier to follow.
  • Not using our VID dialect for GUI, such version is left as an exercise to the reader. ;-)

Here is a capture of how it works:

Image

If you want to play with the same dataset, use this script.

This other session shows how to leverage the rich datatypes of Red, to play with, in the spreadsheet. It also shows that you can access the face objects properties from within the cells and modify them directly:

Image

If you want to play with the same dataset, use this script.

Those captures were done on Windows, which is currently the most advanced GUI backend we have, our OSX and GTK backends are still a work in progress.

This demo was inspired by a similar one written in Tcl/tk which weights 30 LOC only, but takes advantage of a built-in grid component, and a C-like expressions parsing/evaluating library called expr. Though, it is still impressive to see what Tcl/tk can achieve. But the real king there, is the JS 220 bytes demo, even if it is more a testimony to the DOM capabilities (with a 100MB+ runtime behind) than JS expressivness. Nevertheless, Red’s demo is the smallest one we could find in the native GUI category. Even the executable footprint is minimal. Once compiled (just insert Needs: View in the header in such case), it weights 655 KB, which can be further compressed down to just 221 KB, and as usual, zero dependency.

The above source code is very packed to fit in as less lines as possible, though it is still readable, as it is really hard to obfuscate Red code, even when you want to (mandatory spaces between tokens prevent from reaching C-like extremes). Therefore, you will hardly win a codegolf competition where each byte counts…unless you leverage Red’s DSL abilities and write one optimized towards such goal.

How does it work’

It relies on our Red/View GUI engine, the reactive framework, the Parse DSL and the core Red language, which is, for those hearing about it for the first time, a Rebol language descendent, with one of the highest expressiveness among programming languages.

For the ones interested in the details of the above code, you can find a more readable version here and what follows is a detailed explanation. This is actually much simpler than it looks, here we go:

Line 1

L: charset "ABCDEFGHI" D: union N: charset "123456789" charset "0"

Skipping the Red [] header, it starts by defining a few bitsets, which will be used for the parsing operations. We create the D charset by combining N and “0”, which save space.

Line 2

repeat y 9 [repeat x 9 [col: either x = 1 [#" "][#"A" + (x - 2)]

A double loop is used to produce all the widgets needed. col is set to a space character if the column is a header, or to a letter starting from A to G. It will be used to create the cell names and the first row labels.

Line 3

append p: [] set ref: (to word! rejoin [col y - 1]) make face! [size: 90x24

Here we start building the faces which will be accumulated in p block. p: [] is a static allocation that conveniently avoids using a separate line to define p. The set ref: (to word! rejoin [col y - 1]) part is transparent, and let the face produced by make face! be appended to the p list. That transparent expression creates the cell name (in form of a capital letter denoting the column, and a number for the row), which is converted to a word, that gets set to the newly created face. Those words are necessary for supporting the spreadsheet formulas. Last, the opening block for the face definition leaves an option to append a nested expression, size definition being the shortest of all the other property definitions, is a good fit for that.

Line 4

type:    pick [text field] header': (y = 1) or (x = 1)

The face type can be a text for the first row/column and a field otherwise. The header’ word will be useful further in the code, to indicate if the cell is a just label or a field. If you wonder why the use of or instead of the idiomatic any, it is to avoid an expensive conversion to logic!, as required by pick in such use-case.

Line 5

offset:  -20x10 + as-pair ((x - 1) * size/x + 2) ((y - 1) * size/y + 1)

The face position is calculated using the x and y values to set up a grid, which is sligtly moved to the left for (subjective) minor look improvement.

Line 6

text:    form case [y = 1 [col] x = 1 [y - 1] 'else [copy ""]]

The face content is set to col which contains column’s label, or row number, or otherwise an empty string for input cells.

Line 7

para:    make para! [align: pick [center right] header']

The face para object is just used there to center the header labels while keeping the cell content right-aligned.

Line 8

extra:   object [name: form ref formula: old: none]

The extra field is populated with an object which holds the state of the cell, namely:

  • name: name of the cell, in string format for easier usage in the formulas compiler.
  • formula: keeps a reference to the last entered formula, in text format, as typed by the user.
  • old: keeps a reference of the last reaction set by the cell’s formula (or none).

Line 9

actors:  context [on-create: on-unfocus: function [f e][f/color: none

The cell definition is almost done, just remain the event handlers, which we start defining from this line. on-create is called when the cell is created, ensuring that the preset content will be properly processed (in case of a formula) before showing it for the first time. on-unfocus is the main way to trigger the user’s input processing. on-enter was not used, as the tabbing support is not working currently, so pressing Enter key will keep the focus on the same cell, causing unwanted side-effects which would take several lines to workaround. Once proper tabbing will be there, we could add it too. Last, as the function’s body block is opening, we can squeeze in a short expression, which just resets the background color of the cell to its default.

Line 10

if rel: f/extra/old [react/unlink rel 'all]

We start with the hot stuff now. If a previous formula did produce a reaction, we first destroy it.

Line 11

if #"=" = first f/extra/formula: copy text: copy f/text [parse remove text

If a formula is detected, we copy first the content in text, which will be used for the transformation to a Red expression. As series are owned by deep reactors (a face! object is one), the copy will ensure that no object events are produced during the transformation steps. A second copy creates another instance of the input string to be referenced by extra/formula. In case it is not a formula (all that is done before the test succeeds, it will have no effect on the cell content (just wasting some memory, but that’s not what we optimize for, in this exercise). Last, we start the transformation of the input text if it’s a formula, using a Parse rule, applied to text with the leading equal sign removed.

Line 12

[any [p: L N not ["/" skip not N] insert p " " insert "/data "

The rule starts with a loop, the goal is to spot all the cell names and insert a space before it and /data just after it (“A1” becomes “ A1/data “). The not [“/” skip not N] rule is there to avoid transforming cell names followed by a face property (e.g. A1/color). It works by ensuring that the second character after the slash is not a number, allowing to still transform inputs like A1/B2 (A1 divided by B2).

Line 13

| L skip | p: some D opt [dot some D] insert p " " insert " " | skip]]

If the input is not a cell name, we search for numbers (some D) including number with decimals (opt [dot some D]), so we can insert a space before and after (e.g “1+2” become “ 1 + 2 “), in order to enforce Red’s syntactic rules (as we will LOAD that string later). The | L skip part is there to avoid injecting spaces to numbers with leading signs (“-123” would not be touched). The final skip rule just skips every other character we are not interested in.

Line 14

f/text: rejoin [f/extra/name "/data: any [math/safe [" text {] "#UND"]}]

The transformation is almost done, the last step is decorating properly the text to generate the Red expression we are aiming for. First we enclose the resulting expression from last step in a math/safe […] block. The math function just ensures that math precedence rules are enforced, while /safe option evaluates the code using attempt internally, so any error will be returned as a none value (and in such case, the “#UND” string is used). The result of that evaluation is set the the current cell. So for an input formula like: “=A1+B1” in C1 cell, we get as result of the transformation process:

"C1/data: any [math/safe [ A1/data + B1/data ] "#UND"]", which is a LOADable expression in string format. But LOAD is not used in the demo code' Well, it is, thanks to a new feature in 0.6.1 release: by default the /text property of a field is synchronized in realtime with its /data property, using a LOAD call. If it fails, /data is set to none value. Conversely, setting /data will change /text value at once using a FORM call. Well, that's what the resulting expression is meant to leverage. ;-)

Line 15

if f/data [any [react f/extra/old: f/data do f/data]]]]

Now take a deep breath as we reach the crux of the matter. The previous line set f/text, which, at once created a LOADed version of that string, referred by f/data. If the LOADing failed, f/data would be set to none and then we just exit the event handler. Otherwise, we have something we can use as the input to REACT for trying to set up a new reactive relation for that cell. That’s where the “/data” injection for the cell names in previous steps, becomes useful. Those path! values are statically analyzed by REACT to determine the reactive sources. Though, if no reactive source has been found in the expression (e.g. “=1+2” which would give [C1/data: any [math/safe [ 1 + 2 ]]] in f/data), REACT returns none and we then can simply evaluate the expression, which would assign the result to the current cell /data (hence to /text, making it visible to the user). If REACT succeeded, we have set a new reactive relation for that cell, and by default, the reaction is run once on creation, ensuring that our cell gets the correct visual value (by indirectly setting /data, as usual now). Moreover, we save in extra/old a reference to the expression we used for creating the reactive relation, as we’ll need to destroy if the user inputs a new formula. If you’re still following, at this point, congrats, you can consider yourself a master of both View and the reactive framework. ;-)

Line 16

on-focus: func [f e][f/text: any [f/extra/formula f/text] f/color: yello]

The second event handler is used to restore the saved formula (if any) in the cell, when the user gives it the focus. We also then set the background color to yellow , which is…well, like yellow color, but a bit less yellow…hence the truncated name for an otherwise anonymous color. (Carl, if you’re reading this, I hope you appreciate my tap-dancing around your, sometimes, creative naming schemes. ;-))

Line 17

]]]] view make face! [type: 'window text: "PicoSheet" size: 840x250 pane: p]

This last line is just creating a window, assigning the list of previously created labels and fields to the /pane property (face’s children), then displaying it while entering an event loop using view call. That’s all folks!

Last thoughts

We hope this demo and explanations were both entertaining and informative. Spreadsheet applications are not your common app, they are special. They are a unique combination of incredibly useful and powerful features, while at the same time being usable by pretty much anyone with basic computer skills. Many consider them as the culmination of our industrial software world, Microsoft’s CEO itself declared a few days ago, that Excel was the best product his company ever made.

As Red allows you to create such software in a convenient and straightforward way, using native technologies, we hope this will inspire some of you to invest more time learning Red and to create some amazing software with it!

Beyond the simple fun provided by this demo, it also shows the potential of Red in the native GUI apps domain (we’re just at 0.6.1, we have many more features planned and platforms to support). In the big struggle between native vs web solutions, you can expect Red to become, someday, an option to account for.

In the meantime… have fun with Red, as much as we do! ;-)

0.6.1: Reactive Programming

Despite being a minor release, 0.6.1 still weighs a heavy 588 commits with a big number of fixes and many additions, among which the new reactive framework is the most notable.

Last release introduced, for the first time in the Rebol world, a reactive programming framework, as a part of the GUI engine. While working on improving it, we realized that it could actually be easily generalized beyond GUIs, with just minor changes to its design and implementation.

What is reactive programming’

Let me make a short disclaimer first, this is not yet-another-FRP framework relying on event streams. The reactive model we use is known as object-oriented reactive programming (using a “push” model), which is both simple to understand and close to spreadsheet’s model (i.e. Excel formulas). That model has often been praised for its simplicity and efficiency. You can now use it directly in Red.

So, in practice, what is it’ It is a way to link one or more object fields to other fields or global words, by specifying relationships in a block of code (can be a single expression or a complex multi-step computation). Each time a source field value changes, the target value is immediatly updated, you don’t have to call a function for that, it’s pretty much define-and-forget. ;-) Here’s a simple example in Red:

red>> a: make reactor! [x: 1 y: 2 total: is [x + y]]
== make object! [
    x: 1
    y: 2
    total: 3
]
red>> a/x: 5
== 5
red>> a/total
== 7
red>> a/y: 10
== 10
red>> a/total
== 15

In that example, the is infix function is used to create a reactive relation between the total field and the x and y ones using a simple formula. Once set, total is refreshed automatically and asynchronously each time the other fields are changed, regardless how, where or when they are changed! It’s the same concept as spreadsheet cells and formulas, just applied to object fields.

This reactive programming style belongs to the dataflow programming paradigm. It doesn’t enable you to write code, that you wouldn’t otherwise be able to write in an imperative style. Though, it helps reduce the size and complexity of your code, by abstracting away the “how” and helping you focusing more on the “what” (not dissimilar to FP). The gains of such approach become significant when you chain together many relations, creating graphs of, more or less complex dependencies. GUI programming is where it shines the most, as nodes are visible objects, and reactions produce visible effects.

Here is a comparative example with a reactive GUI vs the non-reactive version:

Let’s make a simple native GUI app using VID, Red’s graphic DSL (we call it a dialect). It will just provide 3 sliders, which control the R, G, B components of the box’s background color.

Image

The reactive version:

to-int: function [value [percent!]][to integer! 255 * value]

view [
    below
    r: slider
    g: slider
    b: slider
    base react [
        face/color: as-color to-int r/data to-int g/data to-int b/data
    ]
]

The non-reactive version:

to-int: function [value [percent!]][to integer! 255 * value]

view [
    below
    slider on-change [box/color/1: to-int face/data]
    slider on-change [box/color/2: to-int face/data]
    slider on-change [box/color/3: to-int face/data]
    box: base black
]

What can we say about the non-reactive version’

  1. Size is pretty much the same, though the non-reactive version has more expressions and code looks denser.
  2. The updating code is spread over 3 event handlers.
  3. The face word in each handler refers to the widget, so we can remove the slider names (very minor gain though).
  4. The box face needs a name (box), so it can be referred to, from the event handlers.
  5. The box face default color is grey, so it needs a black keyword to force it to the right default color (as the sliders are all at position 0 on start). The reactive version sets the right color on start, no need to care about it.

Even in this simple example, we can see that the complexity, and the cognitive load are higher in the non-reactive version. The more relationships can be modeled using reactions in a GUI app, the higher the gains from using the reactive approach.

Red reactive framework

Red’s reactive framework is just ~250 LOC long, and written purely in Red (no Red/System). It relies on object events (equivalent to observables in OO languages) and the ownership system (which will be properly documented once completed in one or two releases time). Rebol does not offer any past experience in such domain to guide us, so it should still be considered experimental, and we need to put it to the test in the wild, to study the pros/cons in real-world applications. We are quite excited to see what Red users will create with it.

Full documentation for the reactive framework is available here. It also explains the important difference between static and dynamic relations.

In a nutshell, the reactive API provides 4 functions (quite big API by our standards):

  • react to create or remove reactions.
  • is infix function for creating reactions which result will be assigned.
  • react’ to check if an object’s field is a reactive source.
  • clear-reactions to remove all existing reactions.

Moreover, react is directly supported as a keyword from VID dialect. That’s all you need! ;-)

Here is a simple demo linking together a couple dozen balls, following each other. Source code is available here.

Image

Let’s now have a look at other features brought by this release.

Time! datatype

A time! datatype is now included in Red, supporting already a broad range of features, like:

  • Path accessors: /hour, /minute, /second.
  • Math operators, including mixing with other scalar types.
  • All comparison operators.
  • Actions: negate, remainder, random, pick.
red>> t: now/time
== 12:41:52
red>> t + 0:20:00
== 13:01:52
red>> t/second
== 52.0
red>> t/hour: t/hour - 5
== 7
red>> t
== 7:41:52

GUI changes

Two main additions to our View engine have enabled the writing, or porting, of some nice graphic demos (thanks to Gregg Irwin for the awesome demos!). Here are a few examples:

Image
Bubble demo
Image
Gradient Lab
Image
Particles demo

View engine changes

  • New time event in View, triggered by timers.
  • New rate facet in face! objects for setting timers.
  • move action allows to move faces between panes in a non-destructive way.
  • Adds support for event/window property.
  • data syncing with text for field and text faces.
  • Add default option for fields (e.g. options: [default 0]).
  • at, skip, pick, poke, copy on image! now accept pair! index argument.
  • Added /argb refinement for image! datatype.
  • Added request-font dialog.
  • Improved size-text native.
  • GUI console faces are now excluded from the View debug logs.

Draw dialect changes

  • fill-pen has been extended to support color gradients.
  • pen accepts off as argument now, to make the subsequent pen-related operations invisible.
  • Allows box to accept edges in reverse order.
  • Radius of circle now accepts a float! value.
  • Added key-color support for to image command.

VID dialect changes

  • Added rate keyword for setting timers.
  • do command now support self to refer to container face (window or panel).
  • Added focus option to faces for presetting focus.
  • Added select option support to preselect an item in a list (using an integer index).
  • Added default option support for field and text faces’ default data facet value.
  • Added support for get-words to pass an handler function as an actor.
  • Adding glass and transparent color definitions.

The red/code repository has also been filled with more demos using the new features, like color gradients and timers.

Other changes

New actions: change, move

New natives: remove-each, new-line, new-line’, set-env, get-env, list-env, context’, enbase, now/time, browse

New functions: repend, overlap’, offset’, any-list’, number’, react, is, react’, clear-reactions, dump-reactions, make-dir, request-font, time’

Parse improvements

  • Added change command.
  • remove also accepts, now, a position argument.
  • Support for parsing binary! series.
  • Several bugs fixed.

Syntax for change command:

  • CHANGE rule value
  • CHANGE ONLY rule value
  • CHANGE rule (expression)
  • CHANGE ONLY rule (expression)
  • CHANGE pos value
  • CHANGE ONLY pos value
  • CHANGE pos (expression)
  • CHANGE ONLY pos (expression)

Example using rule syntax:

a: "12abc34"
alpha: charset [#"a" - #"z"]
parse a [some [to alpha change [some alpha] dot]]
a = "12.34"

Example using pos syntax:

a: "12abc34"
alpha: charset [#"a" - #"z"]
parse a [some [to alpha b: some alpha change b dot]]
a = "12.34"

Console improvements

  • Filenames completion using TAB key.
  • Font and color settings from new menu bar.
  • Ctrl-K will erase to end of line (CLI console).
  • Ctrl-D will remove character or exit like Ctrl-C if empty line (CLI console).
  • Optimized speed of pasted code in console.

Other improvements

  • Allows bitsets to be used as search pattern for find on any-string! series.
  • /next refinement support for do and load.
  • /seek and /part refinements added to read.
  • Added any-list! typeset.
  • Added /with refinement to pad function.
  • Improved split function (though not final version).
  • Added LF <=> CRLF conversions support to UTF-16 codec.
  • input can now read from stdin when run from a child process.
  • Added /same refinement into find and select actions.
  • Added binary! support for data and HMAC key to checksum.
  • Reduced emitted code for setting struct members to float literals on IA-32.
  • Allows owned property to be used by modify on objects.
  • Compiler now accepts creating global op! values from object’s functions.

A big number of tickets have also been processed, 110 bug fixes have been provided since 0.6.0 release. We have about 10% of open tickets which is more than usual, though not surprising after the last huge release, but only 22 are actual bugs. Thanks for all the contributors who reported the issues and helped fix them, Red owes you a lot!

What’s next’

On the road to Android support, we need to be able to properly wrap a Red app in a shared library, which is the main focus for the next release. Moreover, being able to build the Red runtime library only once, will greatly reduce compilation time (the runtime library is currently rebuilt on each compilation). As the work on this new feature is already quite advanced, we expect next release to occur during July, even if we always favor quality over arbitrary deadlines. ;-)

In the meantime, enjoy the new release!

0.4.1: Introducing Parse

One of the greatest feature of the Rebol language has always been its parsing engine, simply called Parse. It is an amazing piece of design from Carl Sassenrath, that spared all Rebol users for the last 15 years, from the pain of having to use the famously unmaintainable regexps. Now, Parse is also available to Red users, in an enhanced version!

So, in short, what is Parse’ It is an embedded DSL (we call them “dialects” in the Rebol world) for parsing input series using grammar rules. The Parse dialect is an enhanced member of the TDPL family. Parse’s common usages are for checking, validating, extracting, modifying input data or even implementing embedded and external DSLs.

The parse function call syntax is straightforward:

parse <input> <rules>  

<input>: any series value (string, file, block, path, ...)
<rules>: a block! value with valid Parse dialect content

Here are a few examples, even if you don’t know Red and Parse dialect, you can still “get” most of them, unlike regexps. You can copy/paste them directly into the Red console.

Some simple examples of string or block input validation using grammar rules:

parse "a plane" [["a" | "the"] space "plane"]
parse "the car" [["a" | "the"] space ["plane" | "car"]]

parse "123" ["1" "2" ["4" | "3"]]
parse "abbccc" ["a" 2 "b" 3 "c"]
parse "aaabbb" [copy letters some "a" (n: length' letters) n "b"]

parse [a] ['b | 'a | 'c]
parse [hello nice world] [3 word!]
parse [a a a b b b] [copy words some 'a (n: length' words) n 'b]

How to parse an IPv4 address accurately:

four:     charset "01234"
half:     charset "012345"
non-zero: charset "123456789"
digit:    union non-zero charset "0"

byte: [
      "25" half
    | "2" four digit
    | "1" digit digit
    | non-zero digit
    | digit
]
ipv4: [byte dot byte dot byte dot byte]

parse "192.168.10.1" ipv4
parse "127.0.0.1"    ipv4
parse "99.1234"      ipv4
parse "10.12.260.1"  ipv4

data: {
    ID: 121.34
    Version: 1.2.3-5.6
    Your IP address is: 85.94.114.88.
    NOTE: Your IP Address could be different tomorrow.
}
parse data [some [copy value ipv4 | skip]]
probe value                      ; will ouput: "85.94.114.88"

A crude, but practical email address validator:

digit:   charset "0123456789"
letters: charset [#"a" - #"z" #"A" - #"Z"]
special: charset "-"
chars:   union union letters special digit
word:    [some chars]
host:    [word]
domain:  [word some [dot word]]
email:   [host "@" domain]

parse "john@doe.com" email
parse "n00b@lost.island.org" email
parse "h4x0r-l33t@domain.net" email

Validating math expressions in string form (from Rebol/Core manual):

expr:    [term ["+" | "-"] expr | term]
term:    [factor ["*" | "/"] term | factor]
factor:  [primary "**" factor | primary]
primary: [some digit | "(" expr ")"]
digit:   charset "0123456789"

parse "1+2*(3-2)/4" expr        ; will return true
parse "1-(3/)+2" expr           ; will return false

Creating a simple parser for a subset of HTML:

html: {
    <html>
        <head><title>Test</title></head>
        <body><div><u>Hello</u> <b>World</b></div></body>
    </html>
}

ws: charset reduce [space tab cr lf]

parse html tags: [
    collect [any [
        ws
        | "</" thru ">" break
        | "<" copy name to ">" skip keep (load name) opt tags
        | keep to "<"
    ]]
]

; will produce the following tree of blocks as output of parse:
 [
     html [
         head [
             title ["Test"]
         ]
         body [
             div [
                 u ["Hello"]
                 b ["World"]
             ]
         ]
     ]
 ]

The Parse dialect

Parse’s core principles are:

  • Advance input series by matching grammar rules until top-level rule failure (returning false) or input exhaustion (returning true). (*)
  • Ordered choices (e.g. in [“a” | “ab”] rule, the second one will never succeed).
  • Rules composability (unlimited).
  • Limited backtracking: only input and rules positions are backtracked, other changes remain.
  • Two modes: string-parsing (for example: external DSL) or block-parsing (for example: embedded DSL).

(*) If collect keyword is used in any rule in its simplest form, a block will be returned by parse no matter if the root rule succeeded or not.

The Parse rules can be made from:

  • keyword : a dialect reserved word (see the tables below).
  • word : word will be evaluated and its value used as a rule.
  • word: : set the word to the current input position.
  • :word : resume input at the position referenced by the word.
  • integer value : specify an iterated rule with a fixed number or a range of iterations.
  • value : match the input to a value
  • | : backtrack and match next alternate rule
  • [rules] : a block of sub-rules
  • (expression) : escape the Parse dialect, evaluate a Red expression and return to the Parse dialect.

The following keywords are currently available in Red’s Parse implementation. They can be composed together freely.

Matching

ahead rule look-ahead rule, match the rule, but do not advance input.
endreturn success if current input position is at end.
nonealways return success (catch-all rule).
not ruleinvert the result of the sub-rule.
opt rulelook-ahead rule, optionally match the rule.
quote valuematch next value literally (for dialect escaping needs).
skipadvance the input by one element (a character or a value).
thru ruleadvance input until rule matches, input is set past the match.
to ruleadvance input until rule matches, input is set at head of the match.

Control flow

breakbreak out of a matching loop, returning success.
if (expr)evaluate the Red expression, if false or none, fail and backtrack.
into ruleswitch input to matched series (string or block) and parse it with rule.
failforce current rule to fail and backtrack.
thenregardless of failure or success of what follows, skip the next alternate rule.
rejectbreak out of a matching loop, returning failure.

Iteration

any rulerepeat rule zero or more times until failure or if input does not advance.
some rulerepeat rule one or more times until failure or if input does not advance.
while rulerepeat rule zero or more times until failure regardless of input advancing.

Extraction

collect [rule]return a block of collected values from the matched rule.
collect set word [rule]collect values from the matched rule, in a block and set the word to it.
collect into word [rule]collect values from the matched rule and insert them in the block referred by word.
copy word ruleset the word to a copy of the matched input.
keep ruleappend a copy of the matched input to the collecting block.
keep (expr)append the last value from the Red expression to the collecting block.
set word ruleset the word to the first value of the matched input.

Modification

insert only valueinsert[/only] a value at the current input position and advance input after the value.
remove ruleremove the matched input.

The core principles mention two modes for parsing. This is necessary in Red (as in Rebol) because of the two basic series datatypes we have: string! and block!. The string! datatype is currently an array of Unicode codepoints (Red will support an array of characters in a future version) while the block! datatype is an array of arbitrary Red values (including other blocks).

In practice, this results in some minor differences in Parse dialect usage. For example, it is possible to define arbitrary sets of characters using the new bitset! datatype, which are useful only for string! parsing in order to match with a large number of characters in one time. Here is an example using only bitsets matching and iterators:

letter: charset [#"a" - #"z"]
digit:  charset "0123456789"

parse "hello 123 world" [5 letter space 3 digit space some letter]

The Bitset! datatype

A bitset value is an array of bits that is used to store boolean values. In the context of Parse, bitsets are very useful to represent arbitrary sets of characters across the whole Unicode range, that can be matched against an input character, in a single operation. As bitset! is introduced in this 0.4.1 release, it is worth having an overview of the features supported. Basically, it is on par with Rebol3 bitset! implementation.

In order to create a bitset, you need to provide one or several characters as base specification. They can be provided in different forms: codepoint integer values, char! values, string! values, a range or a group of previous elements. The creation of a new bitset is done using the following syntax:

make bitset! <spec>

<spec>: char!, integer!, string! or block!

For example:

; create an empty bitset with places at least for 50 bits
make bitset! 50

; create a bitset with bit 65 set
make bitset! #"A"

; create a bitset with bits 104 and 105 set
make bitset! "hi"

; create and set bits using different values representations
make bitset! [120 "hello" #"A"]

; create a bitset using ranges of values
make bitset! [#"0" - #"9" #"a" - #"z"]

Ranges are defined using two values (char! or integer! allowed) separate by a dash word.

Bitsets are auto-sized to fit the specification value(s) provided. The size is rounded to the upper byte bound.

A shortcut charset function is also provided for practical usage, so you can write:

charset "ABCDEF"
charset [120 "hello" #"A"]
charset [#"0" - #"9" #"a" - #"z"]

For reading and writing single bits, the path notation is the simplest way to go:

bs: charset [#"a" - #"z"]
bs/97     ; will return true
bs/40     ; will return false
bs/97: false
bs/97     ; will return false

(Note: bitset indexing is zero-based.)

Additionally, the following actions are supported by bitset! datatype:
pick, poke, find, insert, append, copy, remove, clear, length', mold, form

See the Rebol3 bitset documentation for more info about usage of these actions.

In order to cope with the wide range of Unicode characters, bits outside of the bitsets are treated as virtual bits, so they can be tested and set without errors, the bitset size will auto-expand according to the needs. But that is still not enough to deal with big ranges, like for example a bitset for all Unicode characters except digits. For such cases, it is possible to define a complemented bitset that represents the complement range of the specified bits. This makes possible to have large bitsets while using only a tiny memory portion.

Complemented bitsets are created the same way as normal bitsets, but they need to start with the not word and always use a block! for their specification:

; all characters except digits
charset [not "0123456789"]

; all characters but hexadecimal symbols
charset [not "ABCDEF" #"0" - #"9"]

; all characters except whitespaces
charset reduce ['not space tab cr lf]

Set operations are also possible, but only union is currently implemented in Red (it is the most used anyway for bitsets). With union, you can merge two bitsets together to form a new one, which is very useful in practice:

digit: charset "0123456789"
lower: charset [#"a" - #"z"]
upper: charset [#"A" - #"Z"]

letters:  union lower upper
hexa:     union upper digit
alphanum: union letters digit

Parse implementation

Parse dialect has been implemented as a FSM which differs from the Rebol3 implementation that relies on recursive function calls. The FSM approach makes possible several interesting features, like the ability to stop the parsing and resume it later, or even serialize the parsing state, send it remotely and reload it to continue the parsing. Such features could now be implemented with minimal efforts.

Red Parse implementation is about 1300 lines of Red/System code, with a significant part of it spent on optimized iteration loops for common cases. About 770 unit tests have been hand-written to cover the basic Parse features.

The current Parse runs as an interpreter, which is fast enough for most use-cases you will have. For cases where maximum performance is required, work has started on a Parse static compiler to soon provide the fastest possible speed to Parse-intensive Red apps. The generated code is pure Red/System code and should be about a magnitude faster on average than the interpreted version. When Red will be self-hosted, a Parse JIT compiler will be provided to deal with the cases that the static compiler cannot process.

As Red gets more features, Parse will continue to be improved to take advantage of them. Among other future improvements, binary! parsing will be added as soon as binary! datatype is available, and stream parsing will be possible when port! datatype will be there.

The Red Parse also exposes a public event-oriented API in form of an optional callback function that can be passed to parse using the /trace refinement.

parse/trace <input> <rules> <callback>

<callback> specification:

func [
    event   [word!]   ; Trace events
    match'  [logic!]  ; Result of last matching operation
    rule    [block!]  ; Current rule at current position
    input   [series!] ; Input series at next position to match
    stack   [block!]  ; Internal parse rules stack
    return: [logic!]  ; TRUE: continue parsing, FALSE: exit
]

Events list:
- push    : once a rule or block has been pushed on stack
- pop     : before a rule or block is popped from stack
- fetch   : before a new rule is fetched
- match   : after a value matching occurs
- iterate : after a new iteration pass begins (ANY, SOME, ...)
- paren   : after a paren expression was evaluated
- end     : after end of input has been reached

This API will be extended in the future to get more fine-grained events. This API could be used to provide Parse with tracing, stats, debugging, … Let’s see what Red users will come up with! ;-)

A default callback has been implemented for tracing purposes. It can be accessed using the handy parse-trace function wrapper:

parse-trace <input> <rules>

You can try it with simple parsing rules to see the resulting output.

What about DSL support’

Parse is a powerful tool for implementing DSL parsers (both embedded and externals), thanks to its ability to inline Red expressions directly into the rules, allowing to easily link the DSL syntax with its corresponding semantics. To illustrate that, here is a simple interpreter for a famous obfuscated language, written using Parse:

bf: function [prog [string!]][
    size: 30000
    cells: make string! size
    append/dup cells null size

    parse prog [
        some [
              ">" (cells: next cells)
            | "<" (cells: back cells)
            | "+" (cells/1: cells/1 + 1)
            | "-" (cells/1: cells/1 - 1)
            | "." (prin cells/1)
            | "," (cells/1: first input "")
            | "[" [if (cells/1 = null) thru "]" | none]
            | "]" [
               pos: if (cells/1 <> null)
               (pos: find/reverse pos #"[") :pos
               | none
              ]
            | skip
        ]
    ]
]

; This code will print a Hello World! message
bf {
    ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.
    >++.<<+++++++++++++++.>.+++.------.--------.>+.>.
}

; This one will print a famous quote
bf {
    ++++++++[>+>++>+++>++++>+++++>++++++>+++++++>++++++++>
    +++++++++>++++++++++>+++++++++++>++++++++++++>++++++++
    +++++>++++++++++++++>+++++++++++++++>++++++++++++++++<
    <<<<<<<<<<<<<<<-]>>>>>>>>>>>----.++++<<<<<<<<<<<>>>>>>>
    >>>>>>.<<<<<<<<<<<<<>>>>>>>>>>>>>---.+++<<<<<<<<<<<<<>>
    >>>>>>>>>>>>++.--<<<<<<<<<<<<<<>>>>>>>>>>>>>---.+++<<<<
    <<<<<<<<<>>>>.<<<<>>>>>>>>>>>>>+.-<<<<<<<<<<<<<>>>>>>>>
    >>>>>>+++.---<<<<<<<<<<<<<<>>>>.<<<<>>>>>>>>>>>>>>--.++
    <<<<<<<<<<<<<<>>>>>>>>>>>>>>-.+<<<<<<<<<<<<<<>>>>.<<<<>
    >>>>>>>>>>>>>+++.---<<<<<<<<<<<<<<>>>>>>>>>>>>>>.<<<<<<
    <<<<<<<<>>>>>>>>>>>>>>-.+<<<<<<<<<<<<<<>>>>>>>>>>>>>>-.
    +<<<<<<<<<<<<<<>>>>>>>>>>>>>>--.++<<<<<<<<<<<<<<>>>>+.-
    <<<<.
}</-></->

Note: this implementation is limited to one-level of “[…]” nesting to keep it as short as possible. A complete, but a bit more longer and complex implementation using Parse only, is availaible here.

So, such approach works incredibly well for small DSLs. For more sophisticated ones, Parse still works fine, but it might be less helpful as the DSL semantics get more complex. Implementing an interpreter or compiler for a more advanced DSL is not a trivial task for most users. Red will address that in the future by providing a meta-DSL wrapper on top of Parse, exposing a higher-level API to build more sophisticated DSL by abstracting away core parts of an interpreter or compiler. A DSL for building other DSLs is not a crazy idea, it already exists in a very powerful form as the Rascal language for example. What Red will provide, will be just one step in that direction, but nothing as sophisticated (and complex) as Rascal.

Other changes in this release

Just to mention other changes in this release, now that we got rid of the 800 pound gorilla in the room. This release brings a significant number of bug fixes, both for Red and Red/System. Also, thanks to qtxie, the ELF file emitter is now on par with the other ones when producing shared libraries.

Thanks to all the people involved in helping for this big release, including design fixes and improvements, testing, bug reporting and test writing.

Enjoy! :-)

Posts:

Tags: