[Scons-users] Reducing toposort time for Java builds

Greg Ward greg at gerg.ca
Wed Jul 4 21:15:12 EDT 2012


A couple of observations:

1) A textbook toposort is O(N+E), where N = number of nodes and
E = number of edges. I believe that SCons' toposort is O(N^2).
Reducing N is either a win or a big win.

2) The only sensible way (IMHO) to compile Java code is to compile
groups of related .java files together, not file-by-file as C/C++
builds typically work. This is a fundamental consequence of both
language design and the way javac works. Gradle, the only Java
build tool that doesn't suck, works this way. (It calls those
groups of related files "source sets".) Ant appears to work this
way. So that's how my build works. (I'm pretty much ignoring SCons'
built-in Java support. I've rolled my own by stealing some good
ideas from Gradle, and it seems to work.)

3) Java programmers, at least where I work, create new source files
like there's no tomorrow. "Oh, I need to factor out a function
here... NEW FILE!" N++ every time.

Thus: N is ever increasing, which makes O(N^2) really hurt. But
there's *no point* in having a node in the graph for every one of
those source files, since we compile whole groups of them together
(typically 100-500 files at a time). If any file in a particular
source set changes, we recompile the whole thing, and then all
downstream source sets. There's no other way to sanely handle Java
dependencies. (The problem, if you're curious, is that there is no
clear distinction between interface and implementation, either in the
source code or bytecode. So to be safe, any implementation change has
to be interpreted as a possible interface change.)

So I've been thinking, why not just represent each Java source set as
a Node? That is, instead of

com/example/lib1/*.java # 35 files
com/example/lib2/*.java # 17 files
com/example/app1/*.java # 483 files

being 35 + 17 + 483 nodes, it should be 3 nodes: one for lib1, one for
lib2, and one for app1. (Assuming each of those directories correspond
to one source set, of course.)

Has anyone tried anything like this? My one previous attempt led me to
the conclusion that subclassing Node directly is a hell-spawned
nightmare, whereas subclassing File is fairly straightforward. But I
don't want to subclass File, I need to subclass Node, because a
collection of related files is not a file. It's a different kind of
Node.

Would be curious to hear other opinions before I embark on this
possibly mad journey. ;-)

Greg
--
Greg Ward http://www.gerg.ca/
Are we THERE yet?


More information about the Scons-users mailing list