Preface

"Programming a problem oriented language", written by Charles Moore, inventor of the Forth programming language, was started during the 1970s but never published until the 2010s. And that's a shame because it's the best programming book I have ever read. The author assumes you are a competent programmer and that he doesn't need to explain everything. It's the first book I read where you can follow the thoughts of the language designer, as he explains step by step the choices he made.

There is vision, and there is purpose. Forth's philosophy is about the freedom to create, experiment, make software our own and innovate. Even though, it's always been used in the most rigorous environments like the space industry and robotics, I find it a compeling tool for my game design work.

That being said, I also studied the evolution of other programming languages like Lisp, C, Pascal, Smalltalk, etc. And my conclusion is that there is nothing that comes close the power and expressiveness you get from using your own Forth. But there is a catch, I want you to understand the principles and reasoning behind my choices, and then do your own implementation based on your tastes.

The logiqub is a software toy and a learning tool. Because modern games are a bit lacking compared to what we grew up with (design-wise), I ended building my laboratory from scratch. You would think that games would get better as time goes on, but they often do not. Companies need to make money, and this means players like me who rank gameplay higher than story and graphics are often disappointed by the lack of meaningful challenge.

A serious difficulty on the journey to make your own games is to transform gameplay ideas into testable software. Games often need custom-made engines, and if you are not a hacker, you're going to face an incredible challenge making them. That's what the logiqub project is about, the hacking tool of a game designer.

It's a personal project, my writing style is a bit controversial. Don't take me too seriously when I criticize something. Also, my goal is to talk about programming as the beautiful craft it is and all its aspects. This means a significant part of my teaching includes personal development. Understanding your own psychology, is part of the journey to become the best hacker you can be.

Section 1 : Psychology

Cardinal virtues

Love and passion. Love yourself, love the art, love your fellow hackers and your users. When you don't love anything anymore, you no longer feel alive. Live the one life you have with passion.
Skill and laziness. Develop a habit of doing things as efficiently and painlessly as possible. Cultivate your skill to the utmost, but do not expect others to put in as much effort as you do. It's easy to abuse logic at the expense of empathy.
Faith and perseverance. Believe that good things will happen along the way, things can sort themselves out, luck can be on your side. If you think positively, you will strive against adversity, conquering your fears. Also don't be over-analytic, looking for defects in everything, will destroy your efforts.
Courage and mojo, the thing that gets you going, your switch. Knowing yourself make it easier to start the engine in the morning, get in the zone, and recover from disturbances. Music works nicely, so does a clean desk/keyboard, and good a posture.

Lastly, try to meditate each day, to find ways to improve. "Did I succeed at using every principle today ? If not, what can I do better ?"

Mindsets

Programming is about feeling empowered to make the rules of the systems we create. It's about being a god. Nothing compares to that. Material systems age, wear, break (like bodies). Logical systems are made of ideas, and ideas are eternal (like books). That's why I prefer dealing with software rather than hardware. Also software is cheap, and mostly requires imagination.

Programming is also simple. You break down a problem into subproblems, find solutions to those subproblems and assemble the parts. The real difficulty is dealing with the context in which you practice programming. Noise and interruptions, miscommunication and misunderstanding, energy level and mood. All factors that greatly affect your ability to concentrate. Intense daily concentration for several months can also lead to burnout, and usually, there is not much warning before you feel worn out.

A good way I found to keep some balance is switching between different mindsets. Acknowledge that there is no silver bullet, different situations call for different attitudes, and you will be well equiped to avoid common traps like perfectionism, wishful thinking or analysis paralysis. This idea is inspired from the six thinking hats by Edward de Bono, and the five elements of Japanese philosophy (godai).

3 elements theory

Sky [blue/white] decision

The mind is a collection of facts (clouds) that generate ideas (lightning). But your mind can also become chaotic (storm). Medidate, write down and organize your toughts, teach other people. All this to make sure you are not wasting time confusing yourself, but being productive. Bear in mind though, while thinking is important, only scheming doesn't get anything done.

To improve your ability to think, play games and do puzzles. Although programming, especially in Forth is already a fun game in itself, some management and planning skills can be learned through games such as chess or tactical role-playing games. It's the subject of one of my books, not sure it's worth your time though. Otherwise read academic studies to broaden your understanding of what everyone is doing and has been doing.

Flame [red/yellow] emotion

Why do you wake up in the morning ? What are your dreams ? What are you getting excited about completing ? You must try to feel your soul burn with passion for the things you are about to accomplish today. Make sure emotions like frustration and boredom do not go unchecked for too long.

Even if you're not a manager, it is useful to be able to ignite the same passion in others around you. This is something that can only be done by connecting with others through your feelings. Using logic with people doesn't work. You can make them feel dumb, and once they have a personal agenda against you, it's game over.

Earth [green/black] action

Execution mode. You get your hands dirty... Well, it depends how often you clean your keyboard too. Anyways, not much to say about this mode, except there is a lot of optimization you can do to make the act of writing code significantly easier and faster.

Learn to type with 10 fingers, use a mechanical keyboard, maybe switch to dvorak, learn an editor like vim or emacs, sit straight. Also, turn off the phone and email, and switch on your mojo. Take breaks every 30 minutes, at the very least every 2 hours. Your goal in this mode is to get in the zone, fully immersed on the task at hand with the goal to get it over with fast.

Theory of learning

What I mean by theory of learning is learning to learn efficiently. We learn by doing. It's a bit weird with programming because we can only express our thoughts to the machine by writing. And we can not write anything unless we have an idea of what we want to get done. It's unlike learning soccer, where we can just kick the ball and see what happens.

Kicking the ball or rolling the dice in programming rarely results in a useful program. Albeit you can discover things accidently, the amount of planning required to write good programs make trial and error a bad method overall. It would be like building a house without a blueprint.

What you want to do instead, is learn to build rooms, houses, then mansions and castles. Start small and improve your fundamental understanding of the computer. Make a lot of experiments, trying different algorithms. It's much better to learn a subject (even mathematical and abstract) with an experiment. The advantage of an experiment is that we can interact with it, modify parameters and validate hypotheses about a particular concept or object.

From those bricks of knowledge we gather through experience, we can form new ideas. That's because the more bricks we have at our disposal, the more ways we have of combining them. With enough desire and motivation, we can then transform those emotions into actions, leading to more experiments, more knowledge and even more ideas...

The ideal learning curve is "easy to learn, hard to master". The goal being to stay engaged during the learning process. I think there are two sides to learning the logiqub. The easy part is using mine, the hard part is making yours. But then again it depends. Using it is harder for me, because it gets kinda boring quick.

The creative part of my brain that likes to recombine ideas all the time doesn't get involved enough to make simple usage all that interesting. Making a nice portable and efficient instruction set is more of a challenge, hence it is easier despite being technically harder. Reason being, I feel engaged. What I am trying to say is pay attention to your soul. Being a fast learner is also about understanding how your soul works to climb mountains smoothly.

Emotional struggles

Fear

The biggest fear is probably how people will react to your art. They might think or say cruel things, especially on the internet where they are shielded by anonimity. Or you might feel anxious because you don't have the skill to tackle a problem. In any case, fear can paralyze anyone. The best cure is to stop thinking and take action. Most of the time, things aren't that bad, and you'll find you are much stronger than you thought. Just lower your expectations and you may find good surprises along the way.

Uncertainty

If you are not sure about an important decision to make, try things out. It's the best way to get feedback and resolve ambiguities. Furthermore, in programming there isn't much room for uncertainty. It's all 0s and 1s, and pretty reliable. You just gotta figure it out, by breaking down the problem into smaller manageable parts.

Doubt

If something is possible, there is no way you can not get it done. It might take some time or be annoying and boring, or you might be too tired. Sometimes a little bit of courage and faith goes a long way to help you get there, so give it a try. If you don't believe in yourself, who will ?

"Whether you think you can or you think you cannot, both are true." — Henry Ford

Instability

Some people have the ability to accept things as they are and move on. That's a good trait to possess, because ultimately a human being is a reactive system. Accept your emotions as they are, don't fight them. Try to understand why you feel that way, whether it's correct, and what you can objectively do about it. Also, don't be so overeager to progress that you neglect sleeping and eating properly. Exercising your body can help in regulating your mood.

"Just be ordinary and nothing special. Eat your food, move your bowels, pass water, and when you're tired, go and lie down. The ignorant will laugh at me, but the wise will understand." — Bruce Lee

Theory of teaching

Okay, the truth is nobody knows how to teach programming. Why ? If you ask me, I would say it is because the whole is greater than the sum of its parts.I want to teach programming the way Dan Heisman or Josh Waitzkin would teach chess.

Dan Heisman teaches you chess by making you aware of what is happening on the board and off the board. He tells you what are the areas that must be worked on to grow as a chess player, and the pitfalls to avoid. He is using concrete examples, and tells you how to think about that position to find the best candidate moves. He also warns you about the necessity to differentiate between critical moves, where you really need to think (pawn structure change, capture) and casual moves (automatic recapture, moving out of check). Because he is such an amazing teacher, any beginner will become an intermediate player in a short time.

In "The Art of Learning", Josh Waitzkin having gone through the process of mastering both chess and taichi (pushing hands) tries to explain a core idea that can not be logically explained. "Learn form to leave form" means that through repeated practice, our brain reconfigures itself to carve a concept into our subconscious. Once that is done, we no longer need to think about how to do something, we just intend to do it and the mind/body reacts accordingly. Walking, driving, or playing a musical instrument can become automatic.

"The consciousness of self is the greatest hindrance to the proper execution of all physical action." — Bruce Lee

Since we are concerned with a mental process, a chess analogy might work best. A core idea in chess is the fork, a double attack. I would even say it's the most fundamental tactic. When a knight jumps out of nowhere to threaten your rook and queen (or king), you remember it and you want to do it too. Then you will use your queen for the same purpose, abusing her long range and flexibility. You will also realise even paws can threaten two pieces at once, and you will be careful when you leave two pieces one square apart in the middle of the board.

Eventually, without having been exposed to this idea before, you will use your king to attack two pieces at once in the endgame. Because you have seen so many double attacks before, at some point you will have internalized just how good they are, and subconsciously try to prevent your opponent opportunities to use them, while setting up your own traps.

The same thing happens in programming. After you have seen a countless number of for loops, while loops, iterators, generators, increment or decrement counters, in different languages, possibly nested, used with different data structures like arrays, linked lists, dictionaries, the idea of iterating over elements of a collection will exist outside of a particular programming language or paradigm.

That is the essence of teaching someone how to become a hacker. A hacker is someone who, through a process of deconstruction and reconstruction, is able to achieve feats that leave non-hackers in awe. However, anyone who has had the persistence to master any activity knows the hidden truth.

That's why explaining things for free is bad. I think that when you have to think really hard about a problem, you remember the solution better. Besides the brain is extremely good at inference and pattern matching. But there is a trick, one can only learn something that is ready to be understood. So it's possible to stay stuck on a plateau for a long time, especially when said knowledge is not internalized by regular practice.

"If you always put limit on everything you do, physical or anything else. It will spread into your work and into your life. There are no limits. There are only plateaus, and you must not stay there, you must go beyond them." — Bruce Lee

My role as a teacher is to challenge you step by step, so you can build a strong fundation, upon which anything is possible. Now, let's get real. I cannot be next to you to answer all your questions. Real hackers do their own homework and read the docs. While I am doing my best to make this system easy to learn, I will make mistakes and can not be exhaustive or up to date on everything, especially as I keep evolving the system.

So I expect you to take advantage of the fact that the system is fully interactive and modifiable to mix things up and learn on your own. After all, nobody really likes to be lead from point A to point B all the time. Google is your friend, use the "ncr" trick to bypass the regional setting. There is also stackoverflow and reddit, c2.com, lambda-the-ultimate, hacker news and obviously wikipedia. Leave questions at logiqub@gmail.com, if I can answer I will.

A great programmer

Being a great programmer has nothing to do with building complicated frameworks. Although well-built software exhibits some complexity inherent to the number of abstractions used to build it, choosing the correct abstractions is mostly impossible on the first try.

It comes down to having a complete understanding of the problem you are trying to solve. And weirdly enough, you do not understand your problem well enough until you make real software. This means you need to decompose, reassemble and improve a number of times before the software can reach elegance.

Being a great programmer is about the depth of your knowledge, and how much patience and willpower you have to get things done. You will know in your heart you have become one of the great when you acquire superpowers like:

conceive and test software in your head without writing it... and after you write it, it just works
you are getting things done 10 times faster or more than other people
you can reverse-engineer software without attaching a debugger by being attentive and creative
you do difficult stuff for fun like writing compilers and emulators

Extreme minimalism

The reasons why you'd want to work with a minimal system :

control over all elements of design and implementation
higher performance in memory footprint and execution speed
faster development cycles (debugging)

"It's not the daily increase but daily decrease. Hack away the unnessential." — Bruce Lee

Unfortunately, in the modern computing landscape, the operating system is your enemy. Why ? Because it gets in the way too much. An operating system should provide memory protection for process execution and a minimum set of services. Now thanks to "worse is better", our open source, general purpose, operating systems are not so bad. Strictly speaking though, nobody has the general problem that operating systems are supposed to solve. Each system only need a few specific drivers for the hardware that is actually in use. The rest is basically junk that gets in the way.

Ideally, hardware should be built to be simple to program, and I would try hard to build a bare metal logiqub. But when I think about BIOS, real mode, protected mode, GDT, UEFI, VGA, OpenGL, etc... I give up. Maybe I can get something done with a Raspberry Pi in the future. Hardware evolves too fast and becomes needlessly complicated. Not only that, but it's often the case that we do not have access to the specifications necessary to write the drivers. Overall, it would be too much of a burden for little benefit.

So even though the optimal solution is to be self-hosting, I have to build on top of commonly used operating systems, reuse ideas from other cross platform languages like Python, and rely on the Simple Media Layer library. That's the practical solution.

Section 2 : Theory

The tree

You can think of the logiqub as a tree. The assembly primitives (virtual instructions) are the roots. The interpreter is the trunk that ties everything together and supports execution. You define new symbols to form the branches and leaves as application components.

"Seek to understand the root. It is futile to argue as to which single leaf, which design of branch, or which attractive flower you like; when you understand the root, you understand all its blossoming." — Bruce Lee

General purpose programming languages are insufficient to teach programming. Pretending to shield you from your own mistakes, advertising the compiler as always generating better code, they prevent your growth by not allowing you to understand the root. The root is the collection of fundamental programming concepts at the lowest level. Once you know the truth, you start to wonder how people can make careers doing seminars about one technology or another. It's all fake, and because people don't know better they believe it (I guess that's why we have religions).

Sure, once you understand how it all works, you can go ahead and use the best tool for the job. But until you know how operating systems, databases, compilers and hardware works... how do you evaluate available tools so you know when and how to create your own ? The only way you're going to get it is by first getting a grasp of assembly. The cool thing is once you do, you also understand how the root works. You will treasure tools that don't get in the way, and start making yours.

The saddest thing is decently skilled programmers creating new languages that serve no purpose whatsoever. Maybe they do it for fun or they genuinely try to solve a hard problem to be famous. I am trying not to judge, but I gotta say, if you develop a modern language, do not try to compete with the giants on their strong points. For example, INRIA and french industries funding Pharo as a general purpose language makes little sense to me, especially when you understand they are comparing themselves to Java (that battle was lost a long time ago).

The root

To understand the root, you must understand what makes the computer tick (I heard it's a crystal). The core concept is surprisingly difficult to pin down because nobody talks about it anymore. A programming language is a nicer way to command the machine than typing 0s and 1s. Ultimately a compiler or interpreter will do the actual translation from source code to machine code. To create a computing machine you need 2 things : to represent information and be able transform it. In simple terms, data and code.

Cells

Within the logiqub, data and code both fits into memory cells, the basic storage unit. Data cells are essentially numbers, code cells are subroutines. For the remainder of the book, I will call subroutines routines in the context of the logiqub. There is a third kind of cell, the pointer (a memory address), used to refer to another cell.

number, unit of data
routine, unit of code
pointer, unit of reference

Every programming concept can be understood in terms of those basic elements. In the C language for example, array indexing and pointer arithmetic are the same thing. In Lisp, a lambda expression is just a function pointer. Most languages implement closures as a combination of a code pointer and a data pointer. And so on...

The last element is the symbol, label or identifier. By assigning a name to a cell we can give it a meaning. That's what happens in a spreadsheet program like Excel. All cells have an address (column row), but can also be renamed. When you write a program, you prefer to use memorable names. Names are stored in cells as well.

Compiled languages like C/Pascal will remove symbol information because they are not necessary to program execution. Dynamic languages however (JavaScript, Python, Lua), keep symbols in a dictionary, making the language interactive and reflective (this is not accurate, but still a good approximation). And that, was the starting point of my journey to create the logiqub. I knew it made no sense whatsoever for a dynamic language to be slower than a compiled language by a factor of 10-100x or more. And I was right.

Data

There are many data structures, but two stand out as fundamental :

array
linked list

The array is a contiguous region of cells with a known size. And you can imagine the linked list as a string of cell pairs. Each cell pair is composed of one cell holding data and the other pointing to the next element. A special element (often empty/null) will indicate the end of the list. Once you have mastered these two data structures, more complex ones like records, trees and graphs can be manipulated in your mind easily.

Code

The computer will follow the instructions given in the program. Like musical scores we like to have all kind of fancy notations to describe their order of execution. The common control structures supported in programming languages are :

sequence (expressions delimited by ";" or newline)
selection (if, case statements)
iteration (for, while statements)
recursion (call of the same function)

In reality, you only need a goto instruction to do all the above. Why a single goto is sufficient ? It's because machines use a program counter to keep track of where we are in the program. Modifying the program counter allows us to jump to any position. Modern languages frown upon the usage of goto, because readability is improved by the usage of standard control structures.

In summary, the processor automatically executes instructions sequentially. To choose between branches we need a conditional (forward) jump. To repeat a section of code, we jump back to a previous point. What you must realise is that all the selection, iteration and recursion structures do, is inserting basic jump/goto instructions in the code source's translation to machine code.

Flow

Every program runs on a stack, and every programming language implementation uses them. A stack is a first-in, last-out (FILO) data structure. The two basic operations supported by a stack are :

push, put one element on top of the stack
pop, remove the top element from the stack

Maybe you heard of the runtime error, stack overflow ? It happens when the program misbehaves and uses up all the stack space given by the operating system. Too much pushing, not enough popping. The logiqub uses the standard two stacks of Forth. A data stack ds, and a return stack rs. Postscript uses the names operand stack and execution stack.

But what are they used for exactly ?

the data/parameter stack holds the value of data symbols (nouns) until they can be used by code symbols (verbs)
the code/return stack holds return addresses (program counters), to continue exection of a block of code after entering a symbol's definition

For example, let's say I want to scroll the screen, I would write:

SCREEN SCROLL

In C-style languages, that would look like :

screen.scroll();

But it's the same idea. Screen is a noun, scroll is a verb. Forth's syntax is point free, each word is executed from left to right, top to bottom. Simple. Upon executing the SCREEN symbol, the data stack would hold its value, until the SCROLL symbol consumes it to perform the scrolling action.

Fitness

The duality of data and code, static versus dynamic, makes me think of the yin and yang symbols of Tai Chi philosophy. This feeling is unique to Forth. When you try to write an efficient program, you are constantly evolving it until you achieve an optimal form, where the bone and muscle structure achieve maximum impact using minimum energy. Other programming languages have so many levels of abstractions, they feel extremely fat in comparison.

Section 3 : Syntax

I hope I did a decent job so far, at explaining how programming is supposed to work. Now, let's dive into concrete... err no, that will hurt. Let's describe how to choose a syntax to support our language needs.

The best thing about Forth is that it's an untyped language. You don't have to worry about the compiler telling you what you can and can not write. Obviously, you can build a type system on top of it if you need it (except you mostly don't). Also, it is unrealistic to presume a programming language can provide every data structure a program might need, so I don't do it. It's better to build what I need, when I need it.

Speaking of which, I removed a lot of useless words used in standard Forth. Things like variable, value, create, does, etc. Too complicated. Remember the root, our primitives are : numbers, routines and pointers. But before that, let's discuss symbols.

Symbols

Having access to symbols, doesn't look like much, but it's extremely important. In compiled languages like C, after normal compilation you lose power over the entities that you defined. Then it's a complicated mess to recompile with debugging info, hook a debugger and execute the program with traps on the processor.

It's bad style to prematurely remove symbols from your program until it is properly debugged. That's what C does, and it's a real pain. An older language like Lisp/Smalltalk didn't strip symbols but was slow by design, with dynamic types and late binding on everything. That loss of performance then forced complicated solutions like garbage collectors and optimizing compilers...

Spending time on developping/maintaining a garbage collector, a complicated compiler/debugger doesn't solve the actual problem that you have. A sane way to build a programming language is to design it for adequate performance and easy access to symbols (debugging). That's the Forth style, and that's why I use it.

Definition

Traditionally, Forth definitions have this form :

: squared ( x x -- y )
  dup * ;

The first problem with this form is that the colon ":" does two things at once. First, creating a new entry in the dictionary, second switching mode to begin compilation. The second problem is putting the colon first forces the colon to be an "immediate parsing word", disrupting the normal flow of of Forth's execution. So the correct way to write a definition is :

squared : dup * ;

Now, since we need to differentiate a word from a string, we must also use double quotes. Finally, on the principle that each element should do only one thing, I removed the compilation switch from the colon. That's how I came to using the following form :

"squared" : [ dup * ^ ] ;

You'll notice the caret "^". It's used to pass control back to the caller. In C/Python/Lua, that would be "return". The semi-colon ";" is optional, it updates the definition with metadata for optimization (short defintions are inlined). The bracket pair "[ ]" encloses the compiled block.

Scope

We must solve three problems to implement symbols, declaration, definition and search. The simplest implementation is a single global scope, and it's fine for small programs. However as an application grows, it increases the possibility that you end up with naming conflicts. Since it's a complex issue, let's see how other languages tackle this problem.

In C, to distinguish identical symbols, you can only use compilation units to "encapsulate" them (static symbols are private). Otherwise you prefix them with a library name ("SDL", "Py", "lua"). Pascal/Delphi/C#/Java have namespaces, where the names belong to the space they were defined in, modified in visibility by attributes like public, private, protected, readonly, sealed, etc. The full syntax to invoke a method can look like this :

package.namespace.object.method();

In some cases, you want to refer to a symbol before it has been completely described. It often happens with twin functions that call each other. If A must call B, and B must call A, how do you refer to B that you plan to describe after A ? The C language distinguishes declaration from definition. You use forward declarations to tell the compiler what is the function's prototype (type and number of arguments, return value type). In traditional Forth there is the world "defer" to create just a header without its body. That's another reason to not implement the classical "colon" compiler. If declaration happens separately from definition, I do not need an extra deferring word.

Okay so far, I only talked about declaration and definition. During search, or what is also called symbol resolution, the compiler/interpreter will search its dictionary for matches. Forth has a special stack for this purpose named the search order stack. The elements of this stack are namespace identifiers. To manipulate this stack there are specific words. Postscript, a stack language, much like Forth (some say it was directly inspired by it), uses dictionaries as namespaces. For the logiqub I like to use the term "lexicon" instead namespace. Example logiqub syntax :

"math" : lexicon ;

also math {
  "square" : [ dup * ^ ] ;
} sans

> 5 math square
25

"also math" means remember the current lexicon and put math at the top
"{ }" all symbols defined between the braces belong to math
"sans" means drop math, restoring the previous lexicon
"5 square" would result in a error, because square does not exist in the default lexicon named "core" (contains built-in symbols)

Numbers

Numbers are normal symbols, almost. For example :

> 123 456 +
579

A symbol that is not defined is automatically converted to a number and if that is not possible, an error message is displayed. Numbers are often said to be self-evaluating. Now if I wanted to, I could define :

"4" : [ 8 ^ ] ;

and have "4 1 +" evaluate to "9". That's why I said numbers are almost normal symbols. They are automatically converted, but you can redefine them if you want. Internally most programming languages define at least the first 10 digits, and maybe more up to a hundred numbers. It makes the system slightly faster for those common cases where we don't want to waste time searching for their definition, only to not find them and trigger a conversion routine down the road.

Routines

Routines are the basic units of code blocks. A code block is surrounded by brackets like this :

"a" : [ b c ^ ] ;

When you execute the symbol "a", the routines "b c" are executed until the caret. The caret is equivalent to the "return" statement in C-languages. Its effect is to give back control to the caller.

"d" : [ a a ^ ] ;

When you execute "d", the following happens :

enter d
enter a
execute b, c
leave a
enter a
execute b, c
leave a
leave d

Pointers

To change the value of a cell, we need to know its location. That's the purpose of the "pointer" symbol. It will insert a load instruction followed by an address further away from the current position.

"x" : pointer 0 , ;

To modify the value of x, we use the fetch "?" and store "!" operators :

"x ?" gets the value at address x
"n x !" sets the value n at address x

Data

Constant

Really straightforward :

"one" : [ 1 ^ ] ;
"two" : [ 2 ^ ] ;
"three" : 3 constant ;
"four" : 4 constant ;

In case you want the interpreter to calculate a value for you, there is another notation :

"six" : 2 3 * load, exit, ;

Variable

Use a pointer, Karl.

Array

To declare a five cells array named "a" :

"a" : pointer 5 cells allot ;

"a" : creates a symbol header
"pointer" puts the address of the next available cell on the stack
"5" puts the number 5 on the stack
"cells" multiplies 5 by 4 (bytes per cell), giving 20 bytes
"allot" advances the body pointer (by 20 bytes), reserving space for the array
";" updates the size of the body in the symbol's header

If you like to have a counter of the number of elements, you can allocate one more cell :

"a" : pointer 6 cells allot ;

Keeping the counter in the first cell allows the following operations :

"a ?" gets the count
"n a !" sets the count to the value n
"a i @" gets the element at index i
"a i @ ?" gets the value of element at index i
"x a i @ !" sets the value of element at index i to x

The cell offset operator "@" : [ 4 * + ^ ] ; Note that in the above example, the valid range for i is 1 to n.

Linked list

To declare the first element, "head", of a linked list :

"head" : pointer 0 , 0 , ;

"head" : creates a symbol header
"pointer" puts the address of the next available cell on the stack
"0" puts a zero on the stack
"," pops the stack, inserts the zero in the body, and advances the body pointer by one cell
";" updates the size of the body in the symbol's header

The linked list is a dynamic structure, we need a few symbols to update it :

"tail" : pointer head , ;
"append" : [ tail ? here !  here tail !  2 cells allot ^ ] ;
"prev" : [ ? ^ ] ;
"get" : [ 1 @ ? ^ ] ;
"set" : [ 1 @ ! ^ ] ;

"tail" gives the position of the last element of the list
"append" allocates space and updates the tail variable
"prev" will give the address of the previous element
"get" puts the value of the element on the stack
"set" pops the stack and updates the value of the element

The first element can either refer to itself, or hold a sentinel value like the null pointer (0). If you choose a null pointer, be careful to always test for it. Executing "prev" on the head cell will crash otherwise. Example usage :

append append
77 tail set
22 tail prev set
tail get .
tail prev get .

Strings

Welcome to the bonus level. Joke aside, we are constantly using character strings enclosed by double quotes to define symbols. I thought just this time I will tell you what really happens. When you type :

> "Hello world!"
134619813 12

The first number is the address of the first character of the string (a pointer). The second number is the length of the string (a number). You can see I did not mislead you when I said we only use numbers, pointers and routines. The double quote routine is special, though. In standard Forth, I would write :

> " Hello world!"

... with an extra space, emphasis on the double quote being its own word. Its action is to immediately parse forward until the next double quote character and leave stuff on the (data) stack.

Because I am a nice guy and I want the language to look kewl, I introduced a few special characters : double quote, apostrophe and tilde. These do not require the extra space to perform their duty.

Comments

Comments are funny. I started with the standard two that you find in most languages :

"--" single line comment
"(% %)" multiple lines comment

But then, after allowing multiline comments to nest, I realized how good my implementation was. Normally it's not possible to use the branching symbols when the code is not compiled. However, if you treat a section of code that must be skipped contionally like a comment, then everything falls into place. Hence the symbols :

"(0" and "(1" take an argument on the stack
")(" means else
"%)" means endif

With this I have the equivalent of #if / #endif used by the C preprocessor to provide conditional compilation, except in our case it's conditional interpretation.

Code

Unless you are dealing with multithreaded programs, code execution is stricly linear and predictable. I will not address the insanity going on with barriers, semaphores and locks. There are so many things you can do in code when you are no longer limited by the C language semantics, it's makes me want to cry.

It is also funny how it is said that when newer programming languages mature, they reinvent Lisp because they need more advanced features. A programming language, crippled by design, to be used by crippled programmers... can never come close to the power of assembly. Don't be deceived, Forth is powerful because it is an assembler.

A program is like a song's lyrics that you follow word after word, occasionally using references to repeat sections, move ahead or back. So, there are only two cases to understand. Either we execute the next instruction or we jump somewhere else. That's all, you can close the book now, and go back to real business. Just kidding, I really want to put emphasis on this point. To jump or not to jump, that is the question.

Thinking back on it, you should probably read "goto considered harmful" by Edgar Djikstra (our spiritual father). Because people are insane, they were actually having an argument about all the many different ways you can jump around in code, as if it mattered a whole lot. Goto is just fine, in my honest opinion.

Irresponsible use of it by untrained programmers can be problematic, so I do understand the reasoning. But you know what I think, responsability and freedom goes hand in hand. Train your people to understand what they are doing, instead of showing them mirages and cripple their skill growth. No system should require multiple layers of lies, to appear simple.

Sequence

Nothing special here. Use routines, Karl.

Branching

Also known as selection, choosing an execution path based on a condition. Alongside general purpose registers, the cpu has a special register with bits set depending on the prior instruction. Zero, carry, signed flags, etc. In a stack-based language like Forth, you have to decide if you want to use concrete flags on the stack, or keep them hidden in cpu registers. When you make the flag explicit though, there is another decision to make. Should branching symbols consume the flag ? In one case you might have to "dup" before, while in the other case, you would have to "drop" after. It affects program readibility big time. Personally, I don't like noise.

if ... then ... else

[0 ... ][ ... ]]
[1 ... ][ ... ]]
[+ ... ][ ... ]]
[- ... ][ ... ]]

Because the "words" if/then/else are noisy (and I don't want to speak to the processor anyway), I use symbols looking like blocks. Also, I don't create an explicit flag. The only reason I would need a flag on the stack is because I am working with a complex boolean expression, where subsequent operations would override the result. In this case I create the flag, then drop. Remember that arithmetic operations do update cpu flags, so using a comparison operator to update the flag register is not always necessary. Example:

n 3 - drop [0 ... ]
  versus
n 3 == drop drop [0 ... ]

Looping

I implemented two counted loops, both decrementing. The difference is the termination condition, equal to zero or less than zero. Sometimes you need to count from n-1 to 0, sometimes from n to 1.

n for ... i ... next

n-1 [[ ... i ... -] 
n   [[ ... i ... 0]

There is no "do loop". Also if you need to count up, you're on your own.

An example for loop displaying ascii codes :

"ascii" :
  [ 31 11 [[
    7 [[ 1+ dup emit space . 9 emit -]
  .eol -]  drop ^ ] ;

See, I can't count up so I leave the starting number (31) on the stack and increment it each time. Duplication to print it both as a character and a number. And then some formatting with tabs (9) and newlines (.eol). 12 rows, 8 columns. At the end, I drop the number left on the stack, simple.

Recursion

Basic syntax :

"a" : [ ... a ^ ] ;

Standard Forth has a "recurse" word to do this. They do this because they think it is nice to be able to refer to an older defintion, even though we're redefining the symbol. There was even some insanity with a smudge bit, to temporarily hide the newer definition. It's too complicated.

In the logiqub, there is no "while begin again until repeat". Combine recursion, branches and exit. An example while loop waiting for escape key (27) :

"escape" :
  [ keyhit 0= drop
    [1 key 27 - drop [0 ^ ]] ]]
  10 sleep escape ^ ] ;

Lambda

In Lisp, a lambda expression is an anonymous function. Forth has a similar concept with execution tokens. In C, that would be a function pointer.

> "add" : [ + ^ ] ;
> 123 456 'add push
579

The tick (apostrophe) is a special syntax like the double quote. It gives you an execution token for the symbol after it. This execution token is the address of the code body. What happens is pushing this address on the return stack with trigger the execution of the code.

> here [ 123 456 + ^ ] push
579

What the heck just happened ? Up to this point we always used a definition with a named symbol to execute a block of code. Do you remember that "tick" gives you an address of code body ? The "here" symbol does the same thing in this case. That's how you execute code without declaring a symbol.

If you don't want to waste the body space used to compile a bunch of temporary code blocks (lambdas), use a marker.

"marker" :
...
'marker here - allot

Allocating a negative number will roll back the body pointer to a previous position. Be careful though, subsequent allocation might contain random bytes instead of zeros.

Iterator

An iterator is a cool way to abstract the details of an iteration process.

"iter" : [ 1- ^ ] ;
"loop" : [ [[ dup . ?] ^ ] ;

> 4 'iter loop
4 3 2 1 0

I can do whatever I want inside the loop without having to know how the iteration process happens. The iterator can for example be the index of an array or a pointer to an element of a linked list. As long as the zero flag is not set, we repeat.

Closure

You can think of a closure like a routine with a copy of its parameters. Such a construct is useful to postpone a computation, until you need the results. The Python language has a package named functools for this purpose (partials are callable objects). The logiqbu does the same thing with a pointer to data and an execution token :

"closure" : -- ( xt a - xt' )
  [ here push load, call, exit, pop ^ ] ;

Execution of the "closure" symbol will leave on the stack a new execution token, to later perform the action defined by the code and data given as parameters.

Generator

A generator is another abstraction that can use the previous closure definition.

"code" : -- ( a - n )
  [ 1 over +! ? ^ ] ;
"data" : pointer 0 , ;
"gen" : -- ( - n )
  'code data closure call, exit, ;

Every time you call "gen", the internal counter will be increased and a copy of its value left on the stack. You can reset the counter by using the "data" symbol. It's possible to skip the creation of headers to obtain an unnamed closure.

here [ 1 over +! ? ^ ]
here 0 ,
closure

Coroutine

Coroutines are functions that yield to each other. Because we have access to the execution stack in Forth, and thanks to the fact parameters are held elsewhere, it's a simple task to implement. Example :

"a" : [ 1+ ^^ 2* ^ ] ;
"b" : [ a dup . ^^ . ^ ] ;

> 4 b
5 10

Higher-order function

An example of map :

"data" : pointer 1 , 2 , 3 , 4 , 5 , nil , ;
"code" : [ dup ? dup * . ^ ] ;
"iter" : [ 1@ dup ? 0= drop ^ ] ;
"map" : -- ( xt a xt - )
  [ [[ over push ^^ ?] drop drop ^ ] ;

> 'code data 'iter map .eol
1 4 9 16 25

Exception

TODO: implement catch throw

Objects

In object-oriented programming you define classes and create instances of those classes. I like games, so let's look at a basic rpg example.

Class

"hero" : lexicon ;
also hero {
  "attr" : pointer 100 , 5 , 5 , 5 , 5 , ;
  "vit" : [ ^ ] ;
  "atk" : [ 1 @ ^ ] ;
  "def" : [ 2 @ ^ ] ;
  "spd" : [ 3 @ ^ ] ;
  "lck" : [ 4 @ ^ ] ;
  "new" : [ here 5 cells allot ^ ] ;
  "init" : [ attr swap 5 cells ?! ^ ] ;
} sans

Instance

also hero
"karl" : new init
77 karl vit !
karl vit ? .
sans

Very straightforward and simple, using basic symbols encapsulated in a lexicon. As you see, you don't need the complications you will find in the Java/C++/C# world to achieve this. And symbols are accessible to start testing/debugging interactively.

Operating system interface

Console input/output

Foreign function interface

If you remember, logiqub routines are code addresses. External functions in dynamic libraries are also code addresses identified by a name. That's why it's extremely straightforward to import C functions in the logiqub. Routines and functions share the data stack.

load the library with dlopen
find the symbol with dlsym
rename it as you see fit
invoke the function with cdecl
close the library with dlclose

Dynamic memory allocation

malloc mfree

mcode mdata

mpage+ mpage- mpage@

File system

fopen fsize (fseek) fread fwrite (fdelete) (ferror) fclose include

Clock

Section 4 : Control language

Overview

Every single program can be represented with this universal scheme: Input, Process, Output. This principle applies at every level. logiqub symbols and primitives, assembly instructions, C functions take parameters and give values. When you think about it, the whole virtual machine (like other shell programs) transforms a byte stream into another one.

The interesting part is to determine what is the common set of operations that you must implement, to be able to create a variety of other programs. The good thing about Forth is that its minimal set is quite... minimal. Minimal but powerful and efficient. Powerful because you can create anything you might need from the subset. Efficient since there is little overhead on a properly implemented Forth system.

Potentially we will want to execute our programs on different machines, so portability is important in our case. With a one man army, and given that I want to create cross-platform applications, having to rewrite the control language every time is not affordable. Forth already solved this problem long ago with a virtual machine.

The virtual (stack) machine is simulating a computer inside a real one. It's useful because you have complete freedom to describe your programming language and its features regardless of the underlying hardware (possibly even operating system) you are running on. If you remember the tree analogy, the control language makes up the roots and trunk. The roots are assembly primitives, the trunk is the interpreter.

Input

Most programs need to take parameters in the form of configuration files, command line arguments or interactive shells. More input methods are possible like mice, touch screens or voice recognition. We are only interested in the simple boring methods, but do not despair it's actually not so easy to get it right.

The logiqub takes as input ASCII characters in the range 0-127 (utf-7). Printable characters are in the range 32-126. In the range 0-31, we find standard control characters like backspace (8), horizontal tab (9), carriage return (13), etc. Character 127 (delete) is ignored.

Keyboard input

On Linux, because the operating system thinks it's a good idea to mess with programs they have a canonical line editing mode. It's so useful, most GNU software disable it and enter raw mode anyways. If you don't do that, you have to deal with the fact that you won't receive characters as expected. Ctrl-C and Ctrl-Z will get intercepted for example, giving you no choice if you wanted to use ASCII codes 3 and 26 for something else.

Even more annoying is the fact that Linus thought Ctrl-Backspace is the correct way to produce the backspace code. Standard behavior is to replace it with a delete code. It's not the duty of the operating system to interpret input. Who cares that the VT-100 had a Delete key above Enter ? Every program has different needs. When you consider terminal emulators (xterm), editors (emacs, vim) and shells you are in for a lot of butt hurt when each need to be re-onfigurated. Rant over ? No. Linux also changes CR to LF which is not produced by the Enter key. You will be even more annoyed when you find that other operating systems can behave differently as well.

So raw input mode is the only sane option. Fortunately, Windows doesn't mess with input characters too much. However, on Linux I compile a special library to provide not only raw mode but a key press detection function. Useless editing mode in kernel and useful function out... But it's not over yet. Since I changed mode, I also need to register a clean-up function to reset the mode back to normal. Otherwise, with echo disabled I won't see what I am typing at the shell. At this point, you'll agree with me, it's an art to make simple things that complicated.

List of control characters

Ctrl-C    3   exit program
Ctrl-D    4   end of input
Bksp      8   backspace
Ctrl-I    9   set cursor to the beginning of line
Ctrl-J   10   previous command in history
Ctrl-K   11   set cursor to the end of line
Ctrl-L   12   next command in history
Enter    13   execute command
Ctrl-O   15   auto-completion options
Ctrl-U   21   clear line

The size of the input buffer is 256 characters. The history remembers the last 8 commands. It's necessary to make sure you can not erase characters when the buffer is empty. Same thing with the upper limit, reject characters above the limit. Set your editor to replace tabs with spaces.

TODO: implementation of these control codes and auto-completion.

Pipe input

The logiqub doesn't know or care that the input stream is a pipe (special file). This means when you use the shell or a script to interactively control the logiqub, you are limited to 256 characters per command line.

File input

The logiqub accepts file input by parsing command line arguments as file names. Characters are interpreted one by one, so a file may contain control characters. The major difference with console input is that a file is potentially much larger than a line of text, so I execute a system call to get the size of the file, followed by a dynamic memory allocation to create a temporary buffer.

There is an include directive to facilitate program organization. How it works is really simple. You first create a stack of opened files as a dynamic linked list (obviously you also need save the cursor position). Close the file as soon as its content are loaded in memory. Evaluate the file and then release memory while popping the file input buffer stack until you're done.

TODO: limit memory allocation for file input, or at least check that malloc doesn't fail.

Process

Charles Moore warns against leaving hooks in a program. There are valid reasons why he does so. However, in the context of writing a Forth interpreter, it's bad design to not have hooks for the various components. You could say Andrew Tanenbaum is right in believing that microkernels are just better. Now, he still lost half of the argument by not understanding the value of having software usable right now. Still, an operating system should be flexible enough to permit the exchange of one component for another. That's also what I think of a compiler/interpreter.

Let's say I decide to introduce a new syntax in the language because it's simpler for a particular project. Should I redesign the whole interpreter from scratch ? Maybe I can just reuse the existing parsing and compiling functionality ? It's the same idea as "reader macros" in Lisp. The logiqub is designed with exchangeable readers, parsers, compilers, executors, converters and even an "unknown" handler. There is almost no cost to doing so, since Forth compiles so fast anyways (single pass). I provide an example of a S-expression compiler, you should look it up.

Parsing

Forth is different from other programming languages in the fact that he is trully uniform. Everything is a "word". Other programming languages have keywords and syntax rules that makes parsing difficult.

The current input buffer (tib or fib) is read one symbol after another. Symbols are separated by any non-printable character. That's anything below ASCII 32 or greater than 126.

Symbols are then searched in the dictionary and executed if found. If not found, the system tries to convert it to a number based on the current input converter. If the symbol can not be converted, the problematic symbol is shown as "symbol ?" and the stacks are cleared. The interpreter then awaits for further input.

When the symbol is found however, it is evaluated. Evaluation depends on the current evaluator. One such evaluator is the compiler, the other is the executor.

Search

Dictionary structure

A dictionary is a collection of entries, each entry is made of a name and a definition. There are many ways to implement the dictionary, but the linked list is the simplest. So that's what I use. Optimizations like hash tables and skip lists are not worth it.

The traditional Forth dictionary is mixing word headers and bodies. I don't do that, because then, I can not make a definition fall-through another like this :

"16*" : [ 2* ]
"8*"  : [ 2* ]
"4*"  : [ 2* 2* ^ ] ;

This technique barely saves space so it doesn't matter that much, but it does have some charm. Also, having separate headers make it easy to strip or relocate them if necessary. I keep some medata as well.

prev [ 2 bytes], offset to previous entry (never greater than 64K)
meta [ 2 bytes], type (primitive or thread) and size
link [ 4 bytes], pointer to definition (code or data)
name [12 bytes], length (1 byte) and 11 significant characters

For a total size of 20 bytes per entry. Now there are tricks you can use to compress names since only 96 characters need encoding (7 bits). Let's say we make everything case insensitive, then we're down to 70. Say we remove a few rarely used characters we can go down to 64 easily, or 6 bits. Every 3 bytes, we get a character for free with the compression. That's useful, but not worth the trouble of encoding/decoding in our case.

Execution

An infinite loop checks if there are characters available in the standard input stream stdin. As soon as a character is ready, we put it in the terminal input buffer (if it's printable). Then, when we receive a carriage return (13), we search each symbol (separated by space/new line) and transfer control to the current handler if the symbol was found in the dictionary. Otherwise, we use a number conversion routine, which upon failure triggers the panic routine, printing the unknown symbol and resetting the stacks.

There are two built-in execution handlers. The default executor will perform the semantics of the symbol. The tracer will additionally print the symbol and its resulting effect on the stack. The tracer only works with interpreted symbols. For compiled symbols, use the debugger (compilation handler). Also note that only a compilation handler can deal with branches, because jumping addresses must be calculated.

Threaded code

I can not cover all the different techniques available to generate code. The following is a rough idea of the trade-offs you have to think about depending on your goals. For the logiqub I go with direct threading because it's one of the fastest interpretation technique that doesn't result in hard to read code.

Token (bytecode)
  + portable 
  - slowest
Indirect
  + tight
  - slow
Direct
  + fast
  - large
Subroutine (native)
  + fastest
  - unreadable

One can argue that subroutine threading can be faster and more compact, but then I think if we're generating native code at runtime, we have to deal with execution prevention. Not only that but, depending on the level of optimization it's going to be harder to decompile the code. I love rabbit holes but this is not a route I want to take for an education system.

Execution token

> '+
134546570

Because "+" is a primitive and not a composed routine (thread), I can not push its token on the return stack, and have the interpreter continue execution without crashing. The return stack only handles high level routines. That's why I must first create such a routine, then pushing its address will work.

> "add" : [ + ^ ] ;
> 123 456 'add push
579

In fact, I could make it work with both primitives and composed routines if I choosed a different implementation technique. The current code will automatically dispatch to the next symbol and cannot return to the main interpreter. A central NEXT dispatch however allows changing the dispatch technique (jmp/ret) at runtime, for a performance cost. Internally, I have both a "_prim" symbol and a "_push" symbol, only the later is in the dictionary. The limitation of having to wrap primitives is of no consequence because short definitions are inlined and the enter/leave overhead will disappear.

Compilation

To compile stuff the comma "," is used (like most assembly languages I think). This works for both data and code. The comma will take a number off the stack and deposit it in the body, advancing the pointer by one cell (4 bytes).

The most important thing to understand about the logiqub is the difference between primitives and routines. Primitives are single code addresses or virtual instructions. They do not require an "_enter" primitive to be invoked, and they automatically dispatch to the next virtual instruction. Routines are composed of primitives and must be wrapped between "_enter" and "_leave" (or "_goto"). Both routines and primitives are represented by their address (execution token).

_enter will push the address of the next virtual instruction (2 cells away) on top of the return stack and jump into the definition (1 cell away)
_leave will drop the top of the return stack and jump into it

If there is an _enter before a _leave it is optimized as _goto and _leave is removed. That's tail call elimination. Compiling a number, means to put it on the stack. For this I need a "_load" instruction.

In summary, first put on the stack :

a number (automatically converted)
an execution token given by "'" (tick symbol)

Then, transfer it in the body :

for data use "," (primitive) or "load,"
for code use "," (primitive) or "call,"
with "exit," at the end of the definition

And that's all. If you know Forth, you can see I removed a lot of complicated words. Stuff like "compile", "compile,", "postpone", "literal", and their variations (with or without brackets/parens when they are immediate). Furthermore, there is an immediate translation between syntax and virtual instructions. This means, the compilation handler "[ ]" is not necessary to compile non-branching definitions. The execution handler can do it just fine, albeit less terse.

Okay, now onto the compilation handlers, cause we still need branches and loops. Branches and loops require addresses, and if we would compile those jumps with an executor, things would be messy. There are two compilers, the default one will do everything, _comma, _load, _enter, _leave, _goto but also branches (_jz, jnz, _js, _jns). The default compiler then passes control to a simple optimizer. The optimizer looks for opportunities to use super instructions for common sequences like _cmp_drop, _load_emit, etc. Using the same basic rule, it can perform tail-call optimization or dead code elimination.

The other compiler is the debugger which works like the tracer. I don't have a use for single-stepping, breakpoints and tracking values. When you use Forth, the definitions are so small, you can check each one very easily. Putting them together just works. Advanced features are more useful with really big programs, and several people working on the same code base. That is not a problem we have with the logiqub. Also, we are working with an operating system, which garantees a crash won't shut down the whole system. So yeah, I just trace until the segmentation fault and go from there.

I forgot a small implemenation detail, because it's not a huge deal anyway. Some Forth systems have a lot of insanity going on with double semantics. Compilation tokens are different from execution tokens, because a word can behave differently during compilation than just putting its execution token in the body. This becomes necessary with immediate / state-aware words and they cause a lot of trouble. In the logiqub, all symbols are compiled and executed exactly the same way.

Unknown symbol

When the search for a symbol fails, a _not_found routine is executed that looks for the current converter and invokes it. The converter then invokes the routine assigned to it. There is no internal radix variable. This unknown handler can be changed like this :

"&nine" : [ 999 ^ ] ;
'&nine &?? !

>13375p34k
999

Instead of panicking, 999 is left on the stack instead. The _not_found routine definition is :

[ word &in ? push ^ ]

... simply taking the parsed word and passing it to the input converter. Now let's say you want to add to the logiqub an octadecimal input converter. As you have seen, the converter only needs the address of the input word, which may be anywhere.

> deci
> "123" $ &in ? push
123

"deci" sets up the decimal converter, which definition you can inspect in the source code.

"_octa" : -- ( a - n ) ;
  [ ^ ] ;
"octa" : [ _octa &in ! ^ ] ;

You can have fun writing a definition for the _octa routine. In case the conversion can not be done, call the "panic" routine.

Definition

Output

All output goes to the standard output stream stdout. So unless redirected to a file or printer, everything ends up on the screen. Numeric output is configured by selecting an output conversion routine.

Number formatting

TODO

Section 5 : Application language

To get a better idea of what the logiqub can do, you can launch programs in the demo folder like this :

$ cd ~/logiqub/demo
$ logiqub (app).qub

Graphical user interface applications are more complex. I put their files in a folder. You can start them like this :

$ cd ~/logiqub/demo/(app)
$ logiqub main.qub

Memory layout

4 Kb Core primitives
64 Kb
  Body
  Head
xx Kb
  Main routines

You have 64 Kb of memory for headers and bodies of symbols. If you need more you should by all means use malloc / mfree.

By the way, the guys who invented kibibytes... what the #$%@?!

Layered design

A good way to build an application is to use layers. One well-known pattern is MVC, Model-View-Controller. Although, the view and controller are often merged.

Model and view

The way I like to build an application though, is to create and debug an engine first. The engine models the problem and provide commands to change its state. I then create a view taking data from the model, to display it on the screen with images. The view is a window and the window receives events, so the view is also the controller.

Events and animations

This is nice until you want animations and transitions in the view, while your model is purely sequential. In that case, the view must subscribe to state changes in the model, instead of being perfectly synchronized. That way entities do not jump around instantly.

Technique

Indirection

Abstraction

Those two concepts are different. We use abstraction to abstract details (obvious). An example of abstraction is a hardware driver. Because there can be many different manufacturers, we don't want have to know the details of communication with the device. It's useful to not have to worry about the details But using indirection alone doesn't mean we created a useful abstraction. The brain has limits. Proper use of abstraction is the method we use to deal with an overwhelming amount of detail, therefore it is paramount to master the technique. The less load you put on your brain, the more you can do with it.

Indirection without abstraction is used to create articulations, for when we need to modulate the program's behavior at runtime. Let's say we want to change the meaning of a symbol, without adding a new entry. Updating the link field will rebind the name to a new definition.

It's not much, but too many levels of indirection can cost performance. The indirect threading implementation technique is slower than direct threading for this exact reason. A minor loss of speed for increased flexibility.

TODO: rewrite above to be less confusing.

Factorisation

Metaprogramming

Optimization

This is the very last thing you would like to do. The logiqub is already fast, at least compared to any bytecode interpreter, fastened by a JIT compiler or not. Any performance problem is likely related to algorithms and data structures. That being said, when you understand assembly well, there are some things you can do yourself that do not require a complicated optimizing compiler. Notable examples being scaled arithmetic, and shifting.

Value of pi.

355 113 */
  or
31416 10000 */ -- less accurate

Division by 100.

41 * 12 >> -- less accurate
  or
100 / nip

Division by 10.

205 * 11 >> -- less accurate
  or
10 / nip

Like all interpreters, what makes the logiqub slow is the dispatch overhead. The smaller the primitives, the more important it is to have an efficient dispatch mechanism. Branch misprediction becomes a performance bottleneck, especially in hot loops. The solution is simple, and applies to any programming language : identify hotspots and rewrite them in assembly.

a [ [[ 5* 0] ^ ] ;
b [ [[ 5 * 0] ^ ] ;

ticks
1 10000000 a drop
ticks swap - . .eol

ticks
1 10000000 b drop
ticks swap - . .eol

"a" will be faster than "b", since it's a single operation. However, if you define them as :

a [ [[ 5* 5* 0] ^ ] ;
b [ [[ 25 * 0] ^ ] ;

"a" will likely be slower than "b", even though the dispatch is the same. There is no hard rule, it depends on the cpu and runtime characteristics of the algorithms. Always profile before you commit to optimization.

Postface

Licensing

I reserve all the rights on the software (source, media, documentation) published from this website, unless explicitely stated (imported work).

My intent is to guarantee the freedom of the users to use, modify and distribute the software in a way that is suitable to promote learning and teaching. For this reason, I choose the University of Illinois/NCSA open source license.

Packaging and versioning

I won't do it. Period. It's too much overhead for no value-added. The virtual machine should be simple and sufficiently well designed to require very little changes. Everything will be stored at logiqub.com.

Notice the project is intended to be used by individuals. I do not want to include outside contributions unless there is a mistake, oversight, typo or significant improvement to make in the core system. The whole thing must stay small and understandable.

git clone ? Forget it, you have wget. Debian package installer ? Forget it, you compile from source. Windows installer ? Forget it, you don't want to pollute the registry, program files and user profiles. It's beyond the amount of retardedness I can tolerate to go for such complex solutions, when the logiqub is a mere toy.

Now if I do change primitives, because I think it's a better design, I will break your applications that depend on it. Then, so be it, diff patch and regular expressions are your friends. Maintaining your own versions of the logiqub is also an option. Make the system yours for god's sake. I am only maintaining my version and sharing it with you, the rest is not my responsability. Take my ideas, not my code.

"Absorb what is useful, discard what is not, add what is uniquely your own." — Bruce Lee

Table of contents

Preface

Section 1 : Psychology

Cardinal virtues

Mindsets

3 elements theory

Sky [blue/white] decision

Flame [red/yellow] emotion

Earth [green/black] action

Theory of learning

Emotional struggles

Fear

Uncertainty

Doubt

Instability

Theory of teaching

A great programmer

Extreme minimalism

Section 2 : Theory

The tree

The root

Cells

Data

Code

Flow

Fitness

Section 3 : Syntax

Symbols

Definition

Scope

Numbers

Routines

Pointers

Data

Constant

Variable

Array

Linked list

Strings

Comments

Code

Sequence

Branching

Looping

Recursion

Lambda

Iterator

Closure

Generator

Coroutine

Higher-order function

Exception

Objects

Operating system interface

Console input/output

Foreign function interface

Dynamic memory allocation

File system

Clock

Section 4 : Control language

Overview

Input

Keyboard input

Pipe input

File input

Process

Parsing

Search

Dictionary structure

Execution

Threaded code

Execution token

Compilation

Unknown symbol

Definition

Output

Number formatting

Section 5 : Application language

Memory layout

Layered design