Developing Minim
Recently, I released version 0.2.0 of Minim, my hobby-language that I’ve been developing since last fall. It’s inspired by my time working with Racket (now more than a year). Despite having no formal experience with programming languages, I’ve made quite a bit of progress, although I still have lots to learn. Here are a few of my thoughts from developing the language.
From Static to Dynamic Types
My language of choice for developing Minim is C. Not the best choice for a smooth experience, but it’s been quite interesting. The most annoying part is obviousing dealing with pointers; nearly every bug I’ve run into is a segmentation fault. The most interesting concept, however, is the interaction of Minim’s dynamic type system with C’s static type system.
Mainly, how do you define a dynamic type system in a language
that is statically typed, especially without classes like
in C++ or Java?
The best solution that I’ve found is void pointers and enums.
Intuitively, store a void pointer and an enumerated value such
that the value hints at what is stored at the pointer.
With this method, we can store anything from an integer to
a string in a unified object, or the MinimObject
in the
case of the Minim language.
Minim can easily verify the type
of inputs when invoking a function and can throw an error when necessary.
Of course, all of this is abstracted away when running Minim.
The user can still identify the type of an object with procedures
such as number?
or string?
, but they need not worry about
its initialization, storage, and deletion.
An added benefit is multiple types of objects can be stored
in the same list such as (list 1 'a "str")
.
This concept may be trivial, but it is quite interesting to see
that a simple construct can do away with the restrictions
of statically-typed code.
Parsing, Parsing, Parsing
The worst experience, by far, has been figuring out how to parse an input string without any fancy libraries. For the most part, if we can munge the string, it’s not too awful. Split the string by looking for spaces and parentheses/brackets. New lines and tabs are really just spaces in Minim, so they’re not too useful like Python. All we have to do is keep track of quoted strings, Lisp-style quotes, and comments.
Unfortunately with version 0.2.0, Minim supports executing expressions from a file, and errors without syntax information are quite unhelpful. Therefore, we need to store the syntax information of where expressions and functions are located. We can’t munge the string before parsing since we lose information about row and column numbers of characters. We must (a) track row and column information, (b) ignore distracting whitespace, and (c) still tokenize words. I managed to pull it with a reader thats around 200 lines long and a parser of similar length, but it’s quite a mess.
Nevertheless, the results have been impressive.
Backtraces from errors are quite detailed and they
print out “stack frames” with the following format:
<file> <row>:<col> <name>
.
It’s crudeness will definitely be a problem in the future,
but for now it works.
Problems to Come
The most broken part of Minim is the blurry distinction between owners and references, and the lack of separation between mutable and immutable objects.
The first problem borrows a concept from C++. In brief, certain objects are the original owners of their information. It’s better to pass a reference of that object to a function rather than the entire copy since it takes less space and is less cumbersome than pointers that are dominant in C. Minim also implements this system, since it’s less resource intensive to not copy lists every time we use them for read-only purposes. However, without a static type system, the use for this is more subtle. Every built-in function in Minim needs to have two separate cases for owners and references, and this parallel strategy causes many issues. It’s not ideal, but it seems to work.
The second problem is a distinction that Racket makes clear: there are immutable objects and there are mutable objects. In Racket, the two different types of objects each have their own set of procedures. Initially, I chose not to care since it seemed cumbersome to have two sets of procedures. Invoking an in-place update of any hash table seemed reasonable, but it’s problematic with function calls and references. There are a number of examples that I can think of that will break Minim.
Conclusion and Future Work
Although, I spent much of this blog talking about what is bad, there has been a lot of good. Most importantly, I’ve learned quite a bit about developing something as complex as a “lightweight” programming language. As of the time I’m writing this, the repository is well over 10,000 lines of code and the language contains over 120 procedures and numerous types like lists, strings, hash tables, vectors, and more.
In the next update, there will be considerably more
procedures in the “standard library” I’ve been developing.
They will mostly include math functions like gcd
and lcm
.
More list procedures are a must since lists form the backbone
of any Lisp/Scheme language.
Additionally, I need to resolve the issues mentioned above
as well as making procedures proper closures
(storing the environment from which they were created).
This blog was long-winded mostly because there was a lot to talk about. I hope to write more about Minim in the near future as it develops from the small fledgling it is today to a language that is full and robust. Stay tuned for more.