RPython is not a Language: An introduction to the RPython language

The exact definition is “RPython is everything that our translation toolchain can accept” :)

The above quote is from the coding guidelines for RPython. RPython is not a typical language, in that it is not described by a syntax, but is defined by whether or not a tool chain can compile the code.

RPython is a subset of the Python language. That is, any RPython code can run in a Python interpreter. The difference is that you can compile RPython code, with the RPython tool-chain down to C code. So the advantage of RPython is speed after compilation, and the disadvantage is that you cannot use all of Python's features.

The borders and rules of RPython can be blurry, context dependent, and very difficult to understand. Teaching RPython by example can be very tricky as sometimes code will work, and sometimes it wont! Therefore, tutorials for RPython are very different from other languages, e.g. the vague rules expressed in the coding guidelines.

RPython is a language that is largely learnt by trial and error. I have only just scraped the surface, but I will post what I have learnt so far.

Static Typing

To demonstrate RPython I will give a few examples. The main code will be wrapped inside the main function, with other functions defined outside, i.e.:

#FUNCTIONS HERE

def main(argv):
  #MAIN CODE HERE
  return 0

def target(*args):
  return main, None

if __name__ == '__main__':
  import sys
  main(sys.argv)

To set up the RPython tool-chain to compile these examples, please look at my other post.

The first example shows the dependence on context the code can have:

#FUNCTIONS
def add(x,y):
  return x + y

#MAIN CODE
print add(1,2)

This code is RPython code as it will compile and output the integer 3. However, if you add the code add('Graham','Jenson'), it will fail to compile with a large stack trace including the error:

[translation:ERROR] In <FunctionGraph of (rtest:9)main at 0x103f7dc58>:
[translation:ERROR] Happened at file rtest.py line 11
[translation:ERROR]
[translation:ERROR]       print add(1,2)
[translation:ERROR] ==>   print add('Graham','Jenson')

Now, if you delete the line add(1,2) it will compile again. This is because RPython will generate statically typed functions, and if its input or return value are different at different parts of your code, then it will throw an error.

HINT 1: Remember the types of your functions

Python Functions

Many functions that are defined in the core Python may work in RPython, may not work, or may work with certain arguments. For example 'Graham Jenson'.split() is not RPython, but 'Graham Jenson'.split(' ') is.

HINT 2: Be careful when using previously defined functions

The addition to this hint is that many of Python's built-in functions that will not work, e.g. map. The work-around to this is to redefine these functions yourself, there may be a RPython utility library somewhere with these defined.

HINT 3: Redefine built-in functions that you want to use

As an example, here is the function I use to read_lines from a file and return a list:

import os
def read_lines(inputfile):
  f = os.open(inputfile, os.O_RDONLY, 0777)
  x = os.read(f,1)
  lines = []
  tmpstr = ""
  while x != '':
    if x == '\n':
      lines.append(tmpstr)
      tmpstr = ""
    else:
      tmpstr += x
    x = os.read(f,1)

  lines.append(tmpstr)
  f.close()
  return lines

Higher Order Functions

Lambdas in RPython seem not to work. However, you can still define and use higher-order functions (functions that take functions as parameters). Here is an RPython example using a custom map:

#FUNCTIONS
def map(fun,ls):
  nls = []
  for l in ls:
    nls.append(fun(l))
  return nls

def add_one(x):
  return x + 1

#MAIN CODE
map(add_one, [1,2,3])

HINT 4: Don't use lambdas, use functions

Operators

Watch out when using operators as Python will attempt to call 'rich' methods that may not be defined e.g. for == operator call the eq method:

#FUNCTION
def hello()
  return 'hi'
#MAIN CODE
hello() == None

This will break with the error:

MissingRTypeOperation: unimplemented operation: 'eq' on (<CharRepr Char>, <NoneFrozenPBCRepr Void>)

You can fix it by replacing the == is an is, i.e. hello() is None. Note: for some reason 'a' == None is RPython, does anyone know why.

HINT 5: Use explicit functions instead of operators

Work Flow

I found the biggest advantage to working in RPython is the fact that at any point you can execute your code in the python interpreter and see what happens. This means tools like the assert statement can be used to help write valid Python code, but are ignored by the RPython compiler when optimisation is required.

My workflow quickly became

  1. Write and get it working in Python
  2. Compile with RPython tool-chain and fix any errors that occur
  3. Rinse and Repeat

HINT 6: Make it work in Python before worrying about making it work in RPython

SAT Solver: Case Study

Previously, I posted an article about RPython in which I compared the performance between interpreted python and compiled RPython when calculating Fibonacci numbers. I noted that this was a very basic experiment and further, more complicated examples were needed.

To compare Python against compiled RPython in a more complicated function I created an object oriented, conflict driven, learning SAT solver called SATRPy (SATrippy).

If you do not know what a Boolean Satisfiability (SAT) problem is, it is basically trying to answer the question

For a given Boolean formula, can you assign values to the variables to make the equation true?

If the answer is yes the problem is said to be satisfiable, if it is no then the problem is said to be unsatisfiable. A SAT solver is a function that tries to find if a problem is satisfiable or not. The SAT problem is most (in)famous as it is the first identified NP-Complete problem. I have studied SAT problems towards my research into Component Dependency Resolution and found the algorithms and heuristics around this domain very interesting.

RPython SAT Solver

I found it fun to implement a complex function using RPython . The lack of the dynamic aspects of Python can be worked around, and once you get used to not using them.

The biggest annoyance for me was the time it takes to compile the source. Generally, I would compile just to identify problems with the RPython code. This problem is a result of the lack of definition for the language. That is, to find out if my code is RPython, I have to ask RPython.

The most interesting RPython 'hack' I used to get SATRPy working involved the heap implementation I used. Previously when implementing a SAT solver I found the priority queue ( that is used to order literals in order to be chosen) to be incredibly slow. This was because I was using the built in priority queue that came in the Java standard library. To solve the issue I ended up borrowing the heap queue from SAT4J.

To get an efficient heap implementation I edited the heapq core implementation. First by removing the reference to the C implementation (I am going to compile the heap to C with RPython anyway). Then I changed the cmp_lt function from

return (x < y) if hasattr(x, '__lt__') else (not y <= x)

to the RPython friendly

return x.heur() < y.heur()

Where heur() returns the value of the heuristic that is used to sort the items.

Experiment & Results

I ran my SAT solver against 200 (100 satisfiable, and 100 unsatisfiable) Uniform Random 3-SAT problems, each have 75 variables and 325 clauses. This set was taken from the SATLIB Benchmark suite.

In addition to testing my solver using both the pypy interpreter and after compiling with the RPython tool-chain, I tested it against the MiniSat solver. To time the results I ran the solvers over all the files and timed them with the unix time function. I used the real time value as the measurement.

The results are:

pypy Interpreted Solver

  • Satisiable problems: 259.177s
  • Unsatisiable problems: 742.133s

Compiled RPython Solver

  • Satisiable problems: 10.514s
  • Unsatisiable problems: 59.729s

MiniSat Solver

  • Satisiable problems: 0.295s
  • Unsatisiable problems: 0.441s

These results show that for satisfiable problems the Compiled RPython code is about 25 times faster than the interpreted code, and 12 times faster for the unsatisfiable problems. It also shows how much room for improvement there is for my algorithms and heuristics, as minisat was 35 times faster in satisfiable problems, and 150 times faster for unsatisfiable problems.

Conclusion

The more complicated the algorithm the more is gained by compiling to RPython. The simple calculation of Fibonacci numbers in is 12 times faster after compilation, but finding if a SAT problem is satisfiable is 25 times faster after compiled RPython code.

This experiment measured the gain of using RPython as opposed to interpreted python. Although we compared my solver to minisat, this is not a fair test as minisat uses many different optimised algorithms, heuristics, and data structures. A more interesting experiment would be to re-implement minisat directly into RPython. Then we could see what is lost when using RPython instead of coding directly in C.

comments powered by Disqus. comments powered by Disqus