Pascal Warrior's Journey

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 24 December 2010

LLVM IR Builder in Object Pascal

Posted on 00:11 by Unknown

I'm about to graduate from my university (January 2011 if there's no more problems), and as a final assigment (though it's optional in my faculty, but it would be a great experience and honor to have one) I choose to implement a compiler. One of our labs, Formal Method in Software Engineering (or simple FMSE), is the lab where my supervisor gets involved. Therefore, the compiler I'm writing would probably based on a project they're (or have been) working on. Yep, I was given LinguSQL language to implement.

Previously, the language had a compiler, that generates Java code (bad choice IMO) which is then compiled by a Java compiler. The problem with this approach, as in normal Java application, is the HUGE runtime environment that must be distributed if someone wants to use the application. Furthermore, since Java uses interpreted bytecode (don't count GCJ, I even believe less than 10 persons in my campus know that thing exists), the performance is at maximum only 1/3 of native binaries (someone in OSDEV forum ever said). Last but not least, the execution isn't trivial. One must type "java xxx" in order to execute the program.

As a native application (deve)lov(|p)er (read: developer and lover), I decided to implement a compiler for this language that generates native binaries. However, I don't have enough experience in generating native binaries (or at least native assembly). So, remembering an option, I asked my supervisor what if the compiler generates LLVM assembly language (also called LLVM Intermediate Representation or IR)? Do you what he said? "What is LLVM?" (doh). OK, so after bla bla bla, he accepted my choice. The advantages of generating LLVM assembly instead of native assembly are:
  1. A LOT of optimizations for free
  2. It can be compiled to native assembly for MANY platforms
  3. Easy integration with existing libraries
Despite those advantages, there are also disadvantages:
  1. It uses SSA format
  2. Written in C++, more specifically, it officially only supports G++! (there are some hacks to use MSVC but... still it's not official)
  3. Most important one: it doesn't have Object Pascal frontend
The SSA format is not actually a disadvantage, but it's just a little harder to generate code for. But the last one is really a show stopper... or a challenge depending on how you look at it :)

So the work begins. I continue the previous research, the previous Java based compiler that generates Java code uses JavaCC, followed by JavaCUP, and finally UUAG. The first two are parser generators, with some differences, mainly JavaCC generates LL parsers, while JavaCUP generates LALR one. Both are BAD. I always find parser generators are bad since we have no idea whether it's correct or not, and the grammar can't be deduced from the code (except for recursive descent parser generators like Coco/R). The last one is a Haskell based product, which actually runs like recursive descent parser, only in functional languages they're called parser combinators. The last one is quite good, with one important problem: when a parsing error happens, the parser tries to find all possible corrections, therefore slows down the parsing and eats resources. This behavior can't be customized easily and that's what makes me writing the whole thing from scratch using classic approach: a true recursive descent parser. This is the best parser I've ever learned, since it's the most flexible one (there are tons of way to handle parsing error and that's totally up to you, with many methods possibly combined or used specifically for certain productions) and still shows the grammar in its code.

Come to the code generation part, the problem I stated above must be covered. I create my own LLVM IR Builder to generate LLVM assembly language. Due to the SSA structure, it's a bit difficult, but I managed to create it quite successful with beautiful modular architecture. It can now generate modules consisting of functions and global variables, where each functions can have local variables, labels (for branch and loop), arithmetic instructions, memory instructions, etc. It's not yet complete, but already capable of generating simple programs. I'll put it in my bitbucket account when I think it's quite production ready.

Wants some code? OK:

program llvmirbuildertest;

{$mode objfpc}{$H+}

uses
llvmirbuilder;

var
x,y,l,s,a,b: TLLVMSymbol;
c: TLLVMConstant;
cl: TLLVMCallInstruction;
begin
x := TLLVMSymbol.Create('x',lltInteger,true);
y := TLLVMSymbol.Create('y',lltInteger);
c := TLLVMConstant.Create('255',lltInteger);
l := TLLVMLoadInstruction.Create('tmp',lltInteger,x);
s := TLLVMStoreInstruction.Create('tmp',lltInteger,y,x);
cl:= TLLVMCallInstruction.Create('func',lltInteger);
a := TLLVMAddInstruction.Create('a',lltInteger,x,c);
b := TLLVMSubInstruction.Create('b',lltInteger,c,y);
WriteLn(l.GenerateCode);
WriteLn(s.GenerateCode);
WriteLn(cl.GenerateCode);
WriteLn(a.GenerateCode);
WriteLn(b.GenerateCode);
a.Free;
b.Free;
cl.Free;
s.Free;
l.Free;
c.Free;
y.Free;
x.Free;
end.
and the generated LLVM IR:
%tmp = load i32 * @x
store i32 %y, i32 * @x
call i32 @func()
%a = add i32 @x, 255
%b = sub i32 255, %y
Note that it's a partial code, so compiling this with llvm-as would absolutely produce an error.

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Android Programming with Lazarus through Custom Drawn Interface
    OK, so you've read that it's possible to write Android applications with Lazarus and Free Pascal. You go straight to the  wiki , fol...
  • Encryption / Decryption and Asynchronous Socket Programming
    Back again, it's been a long time since my last post due to lack of free time and laziness :p Recently, I've got some posts in Lazar...
  • Brook Framework, a new web application framework for Free Pascal
    Recently, a new post in Lazarus forum surprised me. Somebody, OK, Silvio, announced his new web application framework for Free Pascal with o...
  • LLVM IR Builder in Object Pascal
    I'm about to graduate from my university (January 2011 if there's no more problems), and as a final assigment (though it's optio...
  • Using Google Maps service in a Pascal based Web Application
    My final team assignment of Information Retrieval (IR) class allows us to create any kind of IR system, and we decided to create a food ingr...
  • Big mistake: wrong process model
    This is a lesson learn for application developer. Remember my last post about compiler with LLVM backend? It doesn't end happily. I didn...
  • Top Down Agile Program Development
    This time I would like to post something out of coding world, but rather something about software engineering. Traditional agile program dev...

Blog Archive

  • ►  2013 (2)
    • ►  October (1)
    • ►  January (1)
  • ►  2012 (2)
    • ►  December (1)
    • ►  June (1)
  • ►  2011 (1)
    • ►  January (1)
  • ▼  2010 (2)
    • ▼  December (2)
      • LLVM IR Builder in Object Pascal
      • Using Google Maps service in a Pascal based Web Ap...
Powered by Blogger.

About Me

Unknown
View my complete profile