Compiler Internals Overview
This should become a short overview of the complier's internals.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Essential
- Drafter:
- None
- Direction:
- Approved
- Assignee:
- None
- Definition:
- Discussion
- Series goal:
- Accepted for trunk
- Implementation:
- Informational
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
The compiler reads a text file in the Siml language and outputs a Python program.
== Parsing ==
The parser returns a tree of objects that represents the syntax of the language; it is called Abstract Syntax Tree (AST). Each important element of the syntax is represented by a certain Python class.
The statement:
a = b + c
will be represented in the AST as:
assignment
+ variable access "a"
+ binary operator "+"
+ variable access "b"
+ variable access "c"
A slightly modified version of the library Pyparsing is used in the parser for the hard work.
http://
Pyparsing is a backtracking parser without a lexer. It can potentially look ahead an unlimited number of tokens. Simple things are very simple in Pyparsing and it is easy to get started. Documentation of advanced features is however bad. Pyparsing is slow, but the speed is acceptable for the expected small programs with about 1000 lines of code.
== AST Interpreter ==
To implement the template features (comparable to C++ templates) there needs to be an Interpreter.
- The interpreter builds the symbol table, which is tree structured due to the object oriented nature of the language.
- It computes constant expressions.
- It collects (and creates) the statements that will be part of the final program.
== Flattening ==
Convert the symbol table from a tree to a flat list. Each simulation object gets its own list.
== Equation Reordering ==
== Optimizations ==
- Experience shows that there are many trivial assignments like a.foo = b.foo in a bigger program. One of those variables could be optimized away.
- Variables that are used only once can be optimized away.
- Find common subexpressions and compute them only once. (Task for distant future.)
== Code Generation ==
= Hesse Matrix =
- Create derivatives of dynamic function. Many solver algorithms want this information to run faster. Needs a lib for symbolic algebra like SymPy. Medium term goal.
= Emit Code =
Emitting code from a tree is easy. Currently Python code is emitted; emitting Cython or Fortran is a long term goal.
The code generator is a much simplified version of the interpreter. Instead of doing computations it outputs different pieces of text for each node of the modified AST.