[Scons-users] CacheDir entries

Andrew C. Morrow andrew.c.morrow at gmail.com
Sat Oct 25 14:49:01 EDT 2014


On Fri, Oct 24, 2014 at 7:13 PM, William Blevins <wblevins001 at gmail.com>
wrote:

> Andrew,
>
> I'm not sure I am qualified to answer this question, but since no one else
> has attempted, I will try to help the best I can :)
>

Thanks!


>
> What pieces of information are combined to identify a CacheDir entry?
>
>
> I imagine that all the information written into .sconsign.dblite for a
> target is used.
>

OK. That is somewhat helpful as it means one less thing to understand.
However, it also means that now I have the same question but about the
sconsign file. Is a complete description of the inputs used to derive an
entry in .sconsign.dblite described somewhere?


>
> I ask because I'm interested in setting up a CacheDir on a network share,
>> but it is not clear to me what I need to do to make this work with
>> heterogeneous client systems.
>
>
> I wonder if CacheDir is thread safe.  Can more than 1 SCons instance
> interact with CacheDir files at a time?  That might be the biggest
> question.
>

Based on the comment here:

https://bitbucket.org/scons/scons/src/cff67baa7ab35fc262bfc08298fe365e537422fc/src/engine/SCons/CacheDir.py?at=default#cl-82

it seems that it is intended to be possible to share CacheDirs between
concurrent builds. Thinking about it abstractly it seems like it should
work: if you look for the file in the cache and don't find it, you rebuild
it. If you look for it and find it, you use it. The only race I can see
would be if you were to consume an entry that was in the process of being
written by another process and read an incomplete object file, but it
appears from a quick look at CacheDir.py that new cache entries are first
written to a temporary file and only then renamed to the cached name.
Assuming the underlying FS provides atomic file renames, I think this is
expected to work.

Some further evidence that this is intended to work correctly comes from
this comment in the man page:

"The  --random  option  is useful to prevent multiple builds from trying to
update the cache simultaneously."


>
> As a practical example, consider two otherwise identical linux
>> workstations, where the only difference is that one has GCC 4.8 and the
>> other has GCC 4.9, but both installed as /usr/bin/gcc. Would these
>> workstations share cache entries, assuming identical source files and tool
>> invocations? Why, or why not? Similar question for two machines with
>> varying contents of /usr/include/stdio.h or other system C and C++ headers
>> reached by translation units? For /usr/bin/ld?
>
>
> A SCons target essentially depends on every artifact that goes into a
> build command which includes the compiler and/or linker.  Looking at the
> dependency tree for an object compiled with gcc will show the gcc
> executable as a dependency, so the cached objects will (most likely) differ
> (based on the Decider) since the gcc binaries will not be equivalent.  Of
> course this relates to ALL target traits including compiler, compiler
> arguments, library paths, include paths, etc.
>

I don't think it actually includes the hash of the contents of the compiler
binary. As an experiment, I built my project with /opt/local/bin/clang
being a symlink to /opt/local/bin/clang-3.5. I then updated the symlink to
point to /opt/local/bin/clang-3.4. After switching the symlink, rebuilding
was a no-op.

I also tried editing a system header (/usr/include/stdio.h), by adding a
new #define to it. SCons did not attempt to rebuild targets after this
edit. I verified that the file was in fact included by adding an
intentional error to stdio.h and manually rebuilding a target, which failed
with a compiler error as expected.

So, unless I'm mistaken in expecting that if SCons would not rebuilt
targets on the basis of a change to a system header that it would also not
use the contents of that system header when producing a cache entry, I
think the CacheDir mechanism cannot differentiate between identical build
invocations in the same source tree across systems with varying system
headers or compiler versions.


>
> The system I'm considering would have a few populations of mostly
>> homogeneous systems, so if necessary I could capture some information from
>> the external environment (contents of env['CC'] --version, contents of
>> /etc/lsb_release, etc.) to identify which population a given machine
>> belongs to. If I had such a machine fingerprint, what would be the best way
>> to inject it into the SCons view of the world such that cache entries with
>> different fingerprints were differentiated?
>
>
> In this case, I don't this that the platform will have an effect as long
> as the tools and dependencies are the same.  A different linux OS may have
> different system headers, etc, which would cause cache misses potentially.
> Also, I know that the md5sum implementation for linux changed between RHEL5
> and RHEL6, so that may also cause issues if python uses a 3rd-party library
> for this?  I imagine python uses its own implementation though...
>

Based on the two experiments above, my current view is that variations
across systems headers and compiler versions will not be detected, so I
think some sort of system fingerprint is still required to avoid false
sharing of cache entries. Given that, do you have any thoughts on the best
way to get such a system fingerprint (assuming I can construct a correct
one) to affect every cache entry computation? Essentially, I want to use
the system fingerprint as a salt for the cache entry hash.


>
> Sorry I couldn't be more helpful,
> William
>

I appreciate your taking the time to get the conversation going. Hopefully
some other people will chime in with more details.

Thanks,
Andrew



>
>
> On Fri, Oct 24, 2014 at 12:47 PM, Andrew C. Morrow <
> andrew.c.morrow at gmail.com> wrote:
>
>>
>> Hi All -
>>
>> What pieces of information are combined to identify a CacheDir entry?
>>
>> I ask because I'm interested in setting up a CacheDir on a network share,
>> but it is not clear to me what I need to do to make this work with
>> heterogeneous client systems.
>>
>> As a practical example, consider two otherwise identical linux
>> workstations, where the only difference is that one has GCC 4.8 and the
>> other has GCC 4.9, but both installed as /usr/bin/gcc. Would these
>> workstations share cache entries, assuming identical source files and tool
>> invocations? Why, or why not? Similar question for two machines with
>> varying contents of /usr/include/stdio.h or other system C and C++ headers
>> reached by translation units? For /usr/bin/ld?
>>
>> The system I'm considering would have a few populations of mostly
>> homogeneous systems, so if necessary I could capture some information from
>> the external environment (contents of env['CC'] --version, contents of
>> /etc/lsb_release, etc.) to identify which population a given machine
>> belongs to. If I had such a machine fingerprint, what would be the best way
>> to inject it into the SCons view of the world such that cache entries with
>> different fingerprints were differentiated?
>>
>> One obvious solution would be to have different subdirectories of the
>> network mount for each client fingerprint, but that seems clunky. Are any
>> contents of the Environment hashed into the cache id? If not, can I request
>> that an Environment entry be incorporated, so that I could have
>> env['MACHINE_FINGERPRINT'] = get_machine_fingerprint() and make that affect
>> the cache id? Are there other techniques for injecting this sort of
>> auxilliary information into the cache id for a node?
>>
>> Thoughts on the above questions, comments from others who have attempted
>> this, or pointers to the relevant subsets of the SCons source would be
>> greatly appreciated.
>>
>> Thanks, and apologies for the wall of questions,
>> Andrew
>>
>>
>> _______________________________________________
>> Scons-users mailing list
>> Scons-users at scons.org
>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>
>>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20141025/9c742d29/attachment.html>


More information about the Scons-users mailing list