[Scons-users] AddPostAction memoization problem

Fri May 31 18:54:06 EDT 2024

MIke,

The manpage is the published API.
The reasoning for this is:
1) It's guaranteed to be stable between releases with requirements to note
any changes with a deprecation cycle.
2) The internal SCons API has no such guarantees.  This means we can
fix/change this API between even minor releases as needed without a
deprecation cycle. It does mean if you reach into SCons further than what
the manpage describes you may be impacted by such changes. On the upside,
we can fix your issue without requiring a deprecation cycle.

All the above said, thus far, since SCons' inception (20 ish years), you're
the first to report an issue with the memoization of some node information
between action calls..

Also, note that all the memoization was added due to previous performance
benchmarking, and not done globally without consideration.

So given all the above, we're just trying to figure out what's the proper
way to resolve your issue.
Hopefully in a way which doesn't negatively impact any existing build.
Since Actions don't individually inform SCons which nodes they query and/or
which files they actually modify, it'd likely have to be done between each
Action() in the list of actions held by the executor for each target.

Anyway, for now, you've got a workaround and we'll get a fix in the near
term.
May take a couple weeks and then we have a couple other PR's we'd like to
pull in before pushing a release.

-Bill

On Fri, May 31, 2024 at 7:33 AM Mats Wichmann <mats at wichmann.us> wrote:

> On 5/31/24 04:20, Mike Haboustak wrote:
> > On Wed, May 29, 2024 at 5:41 PM Mats Wichmann <mats at wichmann.us> wrote:
> >>
> >> On 5/29/24 15:27, Mike Haboustak wrote:
> >>> On Wed, May 29, 2024 at 9:49 AM Mats Wichmann <mats at wichmann.us>
> wrote:
> >>>>
> >>>> I'd personally agree as to correct behavior: if a step that makes
> >>>> changes has completed, information related to the node should be up to
> >>>> date and not old.  I guess it's worth mentioning that the getsize()
> >>>> method isn't *officially* exposed as a public interface, it's possible
> >>>> there was an actual reason for that back in the mists of time.
> >>>>
> >>>
> >>> Is it true that a function isn't officially exposed if it doesn't have
> >>> a docstring in the API docs? The documentation includes
> >>> SCons.Node.FS.File.get_size() right next to get_relpath().
> >>>
> https://scons.org/doc/latest/HTML/scons-api/SCons.Node/#SCons.Node.FS.File.get_size
> >>
> >> No, the "public API" is the manpage.  Unfortunately, the "API Docs"
> >> don't distinguish between public and internal reliably (we eventually
> >> added that notation to the main page).  Every so often there's a
> >> docstring annotation to help out, but it's not frequent.
> >>
> >
> > I spent some time yesterday reading the man page, which I haven't done
> > in years, and also the notes at the beginning of the SCons API
> > documentation to better understand your perspective on the public
> > SCons interface. What's clear to me is that my build systems extend
> > SCons with many custom Builders and Tools to encapsulate reusable
> > build components. My role as a tool and builder author involves
> > understanding and calling SCons API functions.
> >
> > When I wrote my initial post, I thought that the caching/memoization
> > feature of SCons was part of an infrastructure layer that lived below
> > the SCons API and that in my role as a tool author I should not need
> > to be aware that SCons was performing this optimization internally.
> > Unfortunately, I'm no longer convinced that this is correct or
> > possible.
>
> You should not need to.  We explained how the current implementation
> causes the behavior you've observed, but that's all... from my
> perspective, memoizing the stat structure must have been tempting
> because it's comparatively expensive - the filesystem layer has to
> initiate a call to top operating system to fetch file information from
> wherever it comes from, and if it's the result of a just-changed file
> it's probably not in the OS cache layer either. When i quickly
> instrumented the node layer and ran an example very similar to yours,
> still just one source file and one target file, the stat method gets
> called five times for the source file and eight for the target; the
> other memozing method that's affected (get_size) is called three times
> for each.
>
> Unfortunately, IMO, stat() is a poor candidate for memoization because
> the return of the Node's stat() call is not invariant (when they teach
> the memoization concept it's nearly always Fibonacci, which is prefect:
> fibonacci(4) is always the same value no matter what you do). Here, any
> time the disk file changes, whether from non-existing->existing, or
> old-contents->new-contents, the call to that function with its node
> object as the argument would return something different if not memoized.
> And as you've mentioned, any action in a chain could cause a change. The
> executor, which fires off those actions, is in a poor position to know
> whether any particular action did cause a change to an underlying disk
> file - it just knows how to invoke the action, not what its results will
> be.  It's also impractical to have each action figure that out - if
> they're function actions it's possible but would have to be coded, but
> in the more common case of a command action, where a command line is
> handed off to the operating system via a subprocess call it's outside
> the SCons process so opaque.  So the executor could invalidate on each
> step whether or not it's needed, or just stop memoizing the stat
> structure.  So much for internal details. I have no idea whether either
> approach is palatable to SCons.
>
> But you as the build maintainer are not supposed to have to know about
> those details, because behind-the-scenes caching for performance reasons
> really should work as you initially expected.
>
>
> > The only way this ActionFunction can get correct information from
> > Node.FS.get_size() is to have a Node.invalidate() method that allows
> > the author to clear a target's cached values. Every API function that
> > memoizes its return value needs to be marked in the documentation, so
> > that we know to invalidate the Node any time we modify the file on
> > disk. The Action executor would use the same API on targets and side
> > effects either before or after running an Action.
>
> I suppose that's an approach that would reduce the amount of recoding
> needed in SCons itself - but it seems poor to have to paper around a bad
> assumption in the SCons implementation (namely that a whole action chain
> can be treated as a single unit, and you only invalidate when it's done)
> by exposing through documentation a whole bunch more information that
> should be internal. But I'm only one opinion...  If it's not, in the
> end, agreed to make changes internally, we'll need to document for
> AddPostAction that actions may not see changes to node details made by
> previous actions on the same target.
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20240531/9908d4d4/attachment-0001.htm>