[Scons-users] AddPostAction memoization problem

Fri May 31 06:20:41 EDT 2024

On Wed, May 29, 2024 at 5:41 PM Mats Wichmann <mats at wichmann.us> wrote:
>
> On 5/29/24 15:27, Mike Haboustak wrote:
> > On Wed, May 29, 2024 at 9:49 AM Mats Wichmann <mats at wichmann.us> wrote:
> >>
> >> I'd personally agree as to correct behavior: if a step that makes
> >> changes has completed, information related to the node should be up to
> >> date and not old.  I guess it's worth mentioning that the getsize()
> >> method isn't *officially* exposed as a public interface, it's possible
> >> there was an actual reason for that back in the mists of time.
> >>
> >
> > Is it true that a function isn't officially exposed if it doesn't have
> > a docstring in the API docs? The documentation includes
> > SCons.Node.FS.File.get_size() right next to get_relpath().
> > https://scons.org/doc/latest/HTML/scons-api/SCons.Node/#SCons.Node.FS.File.get_size
>
> No, the "public API" is the manpage.  Unfortunately, the "API Docs"
> don't distinguish between public and internal reliably (we eventually
> added that notation to the main page).  Every so often there's a
> docstring annotation to help out, but it's not frequent.
>

I spent some time yesterday reading the man page, which I haven't done
in years, and also the notes at the beginning of the SCons API
documentation to better understand your perspective on the public
SCons interface. What's clear to me is that my build systems extend
SCons with many custom Builders and Tools to encapsulate reusable
build components. My role as a tool and builder author involves
understanding and calling SCons API functions.

When I wrote my initial post, I thought that the caching/memoization
feature of SCons was part of an infrastructure layer that lived below
the SCons API and that in my role as a tool author I should not need
to be aware that SCons was performing this optimization internally.
Unfortunately, I'm no longer convinced that this is correct or
possible. My EnsureMaxSize action only needed to call
target[0].get_size() once, and I expected to see an up-to-date result
based on the previous actions performed by the current builder. There
are cases where an action author might need to call
target[0].get_size() more than once, or worse, two separate Node API
calls that share a single memoized function in their call graphs.

Consider this superficial example that prints console messages while
expanding a file to 64K.

-- ActionFunction
def expand_file(target, source, env):
    tgt = target[0].abspath
    print('Before: ', os.path.getsize(tgt))
    print('Before (scons): ', target[0].get_size())
    os.truncate(tgt, 64*1024)
    print('New size:', os.path.getsize(tgt))
    print('New size (scons):', target[0].get_size())
---

--- Build Output
gcc -o test.o -c test.c
expand_file(["test.o"], ["test.c"])
Before:  800
Before (scons):  800
New size: 65536
New size (scons): 800
---

The only way this ActionFunction can get correct information from
Node.FS.get_size() is to have a Node.invalidate() method that allows
the author to clear a target's cached values. Every API function that
memoizes its return value needs to be marked in the documentation, so
that we know to invalidate the Node any time we modify the file on
disk. The Action executor would use the same API on targets and side
effects either before or after running an Action.