Tuesday, June 8, 2010

My grief with out-of-tree code

This post is a long-standing todo item in my brain, but this checkin actually prompted me to do it.

A little bit of history first. As a software developer currently mostly active in the embedded space, I like solutions which allow me to save some CPU cycles or bytes of RAM here and there as long as they still allow me to use the same interfaces. Exploiting the characteristics of the underlying hardware and algorithms is often low-hanging fruit when it comes to optimizations.
So I have this little agenda of about 10 items I wish to implement in the future to make the LLVM framework a little more efficient.

One of these was to reorder the operands inside of the call instruction, to obtain faster access to the callee but mainly to allow fast visitation of all instructions that have a certain callee. I explained all my motives in a separate mail, so I want to save you from the gory details here. To make a long story short, it took me several iterations to catch all places in the optimizers where the operand order was assumed to be in the (callee, arg1, arg2, ...) fashion, instead of the new (arg1, arg2, ..., callee) one, and some miscompilations were only revealed by running the nightly tests. It was a work of blood and sweat because there are many intrinsics and transformations on them and they are often manipulating via the low-level interface getOperand(n). Actually there is a nice helper interface, called CallSite, which makes it easy to access the call instruction's arguments in a high-level fashion and this interface probably the best for LLVM clients, since its also handles the invoke instructions. However, I regard it ok to use the low-level interface in the LLVM tree directly, since it is possible to consistently change things in one atomic commit.
Finally, the day where all regression and nightly tests succeeded, has dawned. My patch seemingly stuck, with all buildbots green. I left for downtown and returned late at night. Just to discover that all has been backed out, because my change broke havoc in an Apple project that obviously used the low-level interface. This was especially frustrating, since I cannot even submit a correcting patch against that project.

I did receive very little encouraging support, not even moral one. Some comments were even pretty dismissive, like this patch has already caused many problems, it is not worth it for such a marginal gain. I have no problem with the comment itself, since I would utter such words in comparable situations too, but this time it was my investment that was at stake. I was pretty determined to keep fighting.

I wondered whether new measurements with higher arity calls would find a significant speedup with my patch applied. So I did some benchmarking for cases where the change is expected to make a difference, and actually found (roughly) a 3% speedup. Clearly this number is only achieved in specific situations, so the generic case would be well below that, but still it could compensate for many little time eaters that are necessary for an advanced optimization pass or analysis.

In my conversation with the involved engineer I enumerated following reasons why resorting to low-level interface in out-of-tree projects is a bad idea:
  • they are not conveying the intent
  • they are depending on implementation details by reaching over abstraction barriers
  • they are an impediment to change
(these are mostly the same reasons which you can find in the above commit message too). He did agree to all this and promised to nudge the OpenGL implementors. I also received a request to submit a patch that guarantees that no silent breakage can happen. Well, I acknowledged that this is a valid concern, so I did some brainstorming.

I succeeded to put together a small patch that detected all instances of get/setOperand(0), the major potential cause of breakage in external projects. Compiling with this patch would pinpoint all places where getCalledValue() should be used.

But I cannot promise more than that!

Why it is impossible to guarantee that with my proposed change either everything keeps working or there is a compilation error with a clear fixing indication? Because the User baseclass does provide the low-level getOperand interface too and I cannot disallow that. C++ only lets me protect parts of the CallInst class...
Would a patch to make getOperand private in CallInst be accepted? Probably not now, but read on.

What aggravates the problem with private trees is file ownership. The engineer who detects the breakage is not entitled to fix simple cases, but needs to lobby the project/file owner first. This results in additional inertia. (Disclaimer: I am not sure whether Apple does have a file-ownership model internally.)

Surprisingly the same thing that happened to me theoretically could happen to any Apple engineer too. Imagine some checkin to LLVM broke the dragonegg GCC plugin which is effectively licensed as GPLv3, so no Apple engineer is allowed to inspect its sources. What would happen if the dragonegg maintainer backed out the change on grounds of "broke an important external project"?

What to do next? Now, whining is not one of the things I like to do, so let's advance in some way. Bill's patch I mentioned in the beginning is a possible first step, as I could rework a large portion of my patch in terms of getArgOperand(n-1) instead of getOperand(n-1), without actually changing the operand order for now. These kinds of incremental refactorings that do not change functionality are mostly welcome in the LLVM world. Then I am dependent on the goodwill of some Apple engineer to make a similar change in that internal project too. Finally the switch (i.e. the operand order) could be flipped.

Why I am reluctant to begin? Because it is lots of work, many new intrinsics have been introduced, I definitely will get a bunch of merge conflicts, and finally, who knows, there might be another internal project that chokes and the whole misery enters a new iteration.

Why do I feel that the change is urgent? Because LLVM is getting popular with an extraordinal speed. As more and more external projects use LLVM as a foundation, more and more code will exhibit bad habits of using low-level interfaces. The few post-v2.7 months are probably the last chance to make the switch in argument order, before things become de-facto cemented. Maybe it is too late already. That would be a pity, though, LLVM as a compilation infrastructure should be as fast and nimble as possible. Every one of its clients would profit.

So, dear external tree developers, I hope you get rid of the low-level calls and use the high-level ones instead. It should not cost you more than touching a couple of lines and retesting. I would be happy to assist you.

Regarding development policy, I would welcome a clear statement about what amount of testing in the LLVM ecosystem is "sufficient" and excludes the risk of a patch being backed out.

Bottom line, I'd love to get this patch wrapped up, but I am dependent on the support of external tree owners. Are you willing to help me?

No comments: