Saturday, April 19, 2008

Absolutely, Positively Crazy

My previous two posts revolved around space optimizations in LLVM. The presented technique is a bit unconventional, and paired with the confusing nature of def-use chains, it can leave one quite puzzled...

Today in the LLVM IRC channel, chandlerc wondered whether I am eliminating the Value* from the Use struct. I corrected him, that, no, it is all about eliminating User*. But his question drove a nail im my head, making me think. After all, he was thinking of the def-use chain, and I was always dealing with user-use arrays. But deeply inside it is the same problem, a.k.a. the dual one:
  • for a user-use array the User* for all array members is constant, while the defs (Value*-s) vary,
  • for a def-use chain on the other hand the defs (Value*-s) are invariant, while the User*-s vary.
I wondered whether it is feasible to use the tagging technique to get rid of the Use::Val too? After all, it is just a linked list, and I have already demonstrated something similar in my previous posts.

It turns out that it is perfectly feasible, and the rest of this post demonstrates how...
...but please do not misunderstand, I do not propose this for LLVM (yet?).

Let's recapitulate what a def->use chain is:
There is an instruction that generates a value, thus defining it:

Val = | Instr |

Then there are other instructions which consume this value, but the above instruction only sees the Use objects chained up:

| Instr |
| +---+ +---+ +---+
+---+ +---+ +---+
The X above means that Use::Next is filled with NULL. Otherwise Use::Next obviously contains the pointer to the next Use object.
So to apply waymarking we have to get away with two obstacles:
  1. There is no way to store something around the Use object (layout-wise), as we did with user-use arrays. We cannot go outside to place the Value*.
  2. Def-use chains can be expanded and shrunk dynamically.
It turns out that 1) is pretty easy, we do not store NULL in the last Use's Next field, but the Value*, with the ‹fullstop› waymark. So when we see ‹fullstop›, the Value* can be extracted from the upper bits.
So, let's assume the ‹stop› and ‹0›, ‹1› waymarks are also tucked on the Use::Next fields. Like usual, given any Use* we can pick up the digits, and at the ‹stop› we convert the digits to a Value* and we are done. This is nice and dandy, until we remember 2). It appears a bit more tricky. Obviously the insertion/deletion will remove waymarks or add new ones. Our algorithm may begin producing bogus Value*-s. So is there any solution to this problem?
I believe there is. I have no proof but I conjecture that each destructive operation should simply place a ‹stop›, and when the algorithm is changed to only accept 32 digits between two ‹stop›s, then we are on the safe side. The corruption gets detected, an uncorrupted segment can still be found, downstream. The corruption can be repaired later (e.g. when traversing the def-use chain).

Well this is not a very efficient way of doing things, so it probably only pays off when space is extremely limited and compilation/optimization speed is not paramount. But in these applications an 8-byte (down from 16/12 bytes) Use struct is definitely tempting.

Monday, April 14, 2008

Use-diet Update

As of r49698 the use-diet branch finishes the testsuite (about 2000 tests) without regressions. Actually the User* member of the structure is not eliminated yet, but instead deliberately set to NULL in various places, and if present, I assert that it equals the computed value.

It was quite a fight to get thus far, but it is done! Time to open a good bottle of wine...

The next days will be devoted to cleaning up the sometimes messy details, doing some statistics and cautiously trying to merge back my work to trunk.

Wish me luck.

Sunday, April 13, 2008

LLVM Data Structures, and Putting Use on Diet

There is a good deal of refactoring going on in the LLVM universe, to wit Dominic's renaming of LLVM*Builder to IRBuilder with assorted simplifications...
The train is moving fast, getting on board is harder, but rewarding. LLVM does not commit to API (or less even, binary) compatibility, so meaningful changes get in swiftly and without much bureaucracy. A simple mail, warning people of the change in advance, and when accomplished, instructions how to do the conversion, are enough to make folks happy.

There are reforms going more than skin deep, too. Clang is is getting a datastructure rewrite with nice gains, and myself is about to reduce the size of the Use struct by 4 bytes, from 16 to 12 (on a 32bit system). The 25% savings is nothing to sneeze at, considering that this struct is the most frequently allocated one in LLVM. And now that I have the functionality basically implemented on a branch, I can say, the idea is working! Consequently, I have the courage to blog about it :-)

So what is this Use-diet all about? The Use struct is the home of the pointers to all Values that an Instruction references. But each Value has to track all of its Users (i.e. the Instructions), and Use provides forward and backward pointers to chain those up. These are the essential 3 pointers. But why is Use 16 bytes? Because in some situations it is important to get back to the User of the referred Value. So it is 4 pointers in total.

Seemingly this is how it works™ and there is nothing that can be shaved off. But wait! Don't all Uses belonging to an Instruction come lined up in a contiguous array? What if we could mark the last Use specially and put a pointer to the Instruction behind it? Or even allocate the Uses immediately in front of Instruction (memory-layout wise)?

These were the first ideas how my brain-storming with Sabre began. After several exchanged emails, we settled on a concept that is IMHO really beautiful. We use 2 bits (the least significant ones, which are always zero, normally) of one of the pointers in Use to implement a serial line-like protocol of waymarks that guides us to the end of the array in some reasonably few steps. I will not detail the algorithm, since it is documented elsewhere, I will only say that there are four kinds of waymarks: ‹fullstop›, ‹stop›, ‹0› and ‹1›.
  • Fullstop means we are at the end already,
  • Stop means begin gathering digits, or if already done so, convert them to an offset, that brings us to the end,
  • 0 and 1 are the binary digits to be picked up.
It is clear to see that for small arrays there are only a small number of operations needed to determine the end of the array. The complexity of the algorithm is O(log N). So we do not really have to get concerned with, say, 10000 predecessors to a PHI node :-). The description even contains a Haskell snippet to encode and decode such offset information.
For reference I shall present it here:
> import Test.QuickCheck
> digits :: Int -> [Char] -> [Char]
> digits 0 acc = '0' : acc
> digits 1 acc = '1' : acc
> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
> dist :: Int -> [Char] -> [Char]
> dist 0 [] = ['S']
> dist 0 acc = acc
> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
> dist n acc = dist (n - 1) $ dist 1 acc
> takeLast n ss = reverse $ take n $ reverse ss
> test = takeLast 40 $ dist 20 []

Printing gives: "1s100000s11010s10100s1111s1010s110s11s1S"

The reverse algorithm computes the
length of the string just by examining
a certain prefix:

> pref :: [Char] -> Int
> pref "S" = 1
> pref ('s':'1':rest) = decode 2 1 rest
> pref (_:rest) = 1 + pref rest
> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
> decode walk acc _ = walk + acc

Now, as expected, printing gives 40.

We can quickCheck this with following property:

> testcase = dist 2000 []
> testcaseLength = length testcase
> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
> where arr = takeLast n testcase

As expected gives:

*Main> quickCheck identityProp
OK, passed 100 tests.
Btw., QuickCheck is awesome!

So where are the uses of this algorithm outside of LLVM?
There is not much of thinking needed to generalize the array to other data structures, which may permit mutating of the node contents, but disallow insertions. Linked lists are an example since one can only cons up stuff to the head. Doubly-ended arrays with fast size() operation (given a pointer to one element) can be implemented if there is another pointer in each node that we can use for storing waymarks to the start of the array. Deques also could work like this, but they too, need to be fully built up before putting in the waymarks.

This all reminds me of cons-hashing but is really a more powerful concept. Let's call it waymarking! And then let the garbage collector put in the waymarks for us...
Down with O(n) complexity on linked-list's length operation!