[Scons-users] modifying/extending SCons to run jobs on Torque cluster
bill at baddogconsulting.com
Fri Oct 3 12:13:00 EDT 2014
Does Torque allow you to request resources? (this is one way we select
which type of node to run jobs on in SGE).
If so you could just specify the "BIGJOB" resource and only mark certain
nodes as having it, and request that when running the "big" jobs..
On Fri, Oct 3, 2014 at 2:57 AM, Dirk Bächle <tshortik at gmx.de> wrote:
> Hi Thomas,
> I'd like to basically second what Bill said. On a techical level, you can
> certainly subclass/rewrite the Node/Taskmaster classes...and there have
> been requests for more info about it in the past. But it's an awful lot of
> work, and all the people that wanted to try anyway, seemed to have given up
> at one point.
> (* switching to meta-level mode *)
> My understanding of your problem/project is, that you try to use SCons as
> a "driver" to your scheduling system. In a way, you want to "traffic-shape"
> the single build processes, to let them run on a multiprocessor
> machine/cluster (I have tinkered with openPBS on a 48-core Linux cluster
> some years ago).
> If your build process is based on files and their dependencies, the
> current Node class and the Taskmaster should provide all the information
> you need, for deciding whether a single part of the project has to be
> rebuilt or not. The Taskmaster already prepares info packets for you, in
> the form of the Job class instances, that then only have to be executed
> And this is probably the best place where your extension could come into
> play. You could try to derive from the "Job" class and extend it, such that
> it is also able to run a single build step via your scheduler system (you
> seem to have that ready in your custom Builder).
> Quoting a part of your original email
> On 02.10.2014 21:24, Thomas Lippincott wrote:
>> I would like to do something like subclassing the TaskManager to be able
>> to examine the dependency tree and choose subtrees to submit as Torque
> This is where the real problem is: deciding which nodes to build via
> Torque (or another scheduler), and which not, is super hard. You're trying
> to implement a second scheduler...which requires you to "add more
> information" to the system. I don't think you'll be able to compute an
> efficient schedule (taking the actual cluster/machine where things are
> executed into account) just by traversing the dependency tree.
> I'll go one step further and state that I wouldn't touch this problem with
> a ten foot pole.
> (* meta-level mode off *)
> Instead, I would stick to manually marking the nodes that are eligible to
> getting scheduled, within the SConscripts. You could write a
> wrapper/decorator method like:
> prog = TorqueJob(env.Program('main',Glob('*.cpp')),
> required_mem="2GB",..., other keys)
> for "tagging" the target node "main" in this case. The Node class already
> has the member "attributes" which you can use to store meta-information
> about it (this is what the Java builder does, for example).
> In your custom Job class you can then take the final steps to check
> whether the current target should be built via Torque (or locally, if the
> overall load on the cluster is too high already), based on your meta-infos
> as given by the user. Then, setup the correct environment for this, before
> scheduling the actual command-line action.
> Just regard that the Taskmaster expects these single Job executions to be
> blocking. So you start a build step, and when it finished executing it's
> either complete (target got built) or it failed. This info and behaviour is
> crucial to the currently implemented algorithm...
> So much for my thoughts, I hope it gives you a few new ideas.
> Best regards,
> Scons-users mailing list
> Scons-users at scons.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Scons-users