Entries for tag "compilers", ordered from most recent. Entry count: 4.
# Compiler Development - A Higher-Order Hardcore
Sat
31
Jan 2015
Of various kinds of programs that a programmer can work with, I think that developing compilers is particularly difficult. Here is why:
If your job is to just work with data (e.g. in Excel), all that can be wrong is just the data.
If you are a programmer, you write programs. Programs generally translate some input data to some output data. So if the output is wrong, you examine the input and if you are convinced that the input is correct, there is probably some bug in your program.
If you develop a compiler, things get even more complicated. Then you write a program (compiler) that will take source code of another program as input and produce compiled program as output. That program will process some data. So if the output of that program is wrong, its input data may be wrong, its source code may have a bug or, if you are convinced that they are both correct, there is probably some bug in the compiler. So there are more “degrees of freedom” here. You examine what’s wrong in the output data, then you look at compiled program to find a bug and finally you examine the compiler to understand why it generated that program.
It’s not always that simple to even determine which part has a bug. Even if your change in the compiler causes program to produce invalid output, sometimes it may be a bug in the source code. For example, the program may rely on some undefined behavior (like use of uninitialized variable), so any change in the compiler can produce different output, while the compiler is still correct.
When learning functional programming, you must understand how to operate on higher-order functions - functions that operate on functions. I can see an analogy here. So if you even consider working in compiler development, better think twice whether you are ready for such higher-order, hardcore level of debugging :)
# Parsing Numeric Constants
Fri
16
Dec 2011
As a personal project I started coding a scripting language. First thing I want to do is parsing of integer and floating point numeric constants. My decision about what syntax to support is based on C++ language, but with some modifications.
Integer constant in C++ can be written as:
123 Decimal Starting with non-zero digit 0x7B Hexadecimal Starting with "0x" 0173 Octal Starting with "0"
It can also be suffixed with "u" for unsigned type and "l" for long or "ll" for "long long".
"l" makes no sense in Visual C++ because "long" type is equal to normal "int" - it has 32 bits, even in 64-bit code. So I'd prefer to use "long" as type and "ll" as suffix for 64-bit numbers.
I also don't like the octal form. First, I can't see any use of it. In the whole computer science I've seen only one situation where octal system is used: file permissions in Unix. I didn't see any single use of octal form in C++ code. On the other hand, I think preceding number with zeros shouldn't change its meaning, so the choice of "0" as prefix for octal system (instead of, for example, "0o") is very unfortunate in my opinion.
It would be much more useful if we could place binary numbers in code. Java 7 introduces such syntax with "0b" prefix. It has also another interesting feature I like - it allows underscores in numeric literals so you can make long constants more readable by grouping digits, like "0b0011_1010".
I'd like to support decimal, hexadecimal and binary numbers in my language. Regular expressions that match these are:
[0-9][0-9_]*[Uu]?[Ll]? 0[Xx][0-9A-Fa-f_]+[Uu]?[Ll]? 0[Bb][01_]+[Uu]?[Ll]?
Floating-point numbers are more sophisticated. A constant that uses all possible features might look like this:
111.222e-3f
Question is which parts are required and which are optional? It may seem that floating-point numbers and their representation in code is something obvious, but there actually are subtle differences between programming languages. "111" is obviously an integer constant, but is the presence of a dot with no digits on the left, no digits on the right, an exponent part or "f" suffix enough to for a proper floating-point constant?
111.222 C++: OK HLSL: OK C#: OK 111. C++: OK HLSL: OK C#: Error .222 C++: OK HLSL: OK C#: OK 111e3 C++: OK HLSL: OK C#: OK 111f C++: Error HLSL: Error C#: OK
I want to support all these options, so regular expressions that match floating-point constants in my language are:
[0-9]+[Ff] [0-9]+([eE][+-]?[0-9]+)[Ff]? [0-9]+\.[0-9]*([eE][+-]?[0-9]+)?[Ff]? \.[0-9]+([eE][+-]?[0-9]+)?[Ff]?
Comments | #languages #compilers Share
# RegScript - Bidirectional Type Inference
Sun
03
Jan 2010
Coding my RegScript programming language is no longer easy as code grows bigger, but it's still much fun. In the last days I've added support for numeric types of different size. Here is the full list: float, double, uint8, uint16, uint32 (uint), uint64, int8, int16, int32 (int), int64
.
I try to keep the syntax as close to C/C++ as possible, but at the same time I introduce some interesting details like:
object[x,y,z]
void Func(float x = sin(globalVar)) { ... }
switch (val) { case 1, 2, 3: ...
switch (val) { case someVar+1: ...
I've also implemented function overloading and many compiler errors and warnings similar to these from C++ compiler. But most interesting feature so far is what I call "Bidirectional Type Inference" :) I first introduced auto keyword to allow skipping type name and next I've made literal constants like 123 typeless so their type is deduced from the context (because I hate typing this f, u or ll postfixes everywhere in C++ code). For example:
// Left to right - these numbers are int16 int16 myShort = -32000 + 10; // Right to left - newVar is int16 auto newVar = myShort;
Comments | #c++ #compilers #regscript Share
# RegScript - my Scripting Language
Wed
30
Dec 2009
RegScript is my scripting language I've started coding yesterday. I'm not a guru in formal languages theory, but I'm quite fascinated with programming languages, parsers and interpreters. I don't know why but I felt I'd love to write my own programming language so I just started coding it, even if it doesn't make any sense. Maybe that's just because it's like being the creator of something that we, as developers, are usually only users. The language syntax is similar to C++. Here are features I've coded untill now:
-10, 0xFFFF, 0b1010, 1e-6f, false, true, 'A', "foo"
+ - * / % < ?: ! && || ~ & | ^ << >>
= += -= *= /= %= &= |= ^= <<= >>= ++ -- == != < > <= >= ( ) sizeof ,
sin, print
, intrinsic constants: PI
.
The implementation is not optimal yet as I don't generate any bytecode. I just execute statements and evaluate expressions from a tree made of polymorphic nodes dynamically allocated by parser. Sample program:
print("Hello World!\n"); for (auto i = 0; i < 10; i++) print("i = " + i + '\n');