[Scons-users] CacheDir entries
William Blevins
wblevins001 at gmail.com
Sat Oct 25 15:06:12 EDT 2014
>
> I don't think it actually includes the hash of the contents of the
> compiler binary. As an experiment, I built my project with
> /opt/local/bin/clang being a symlink to /opt/local/bin/clang-3.5. I then
> updated the symlink to point to /opt/local/bin/clang-3.4. After switching
> the symlink, rebuilding was a no-op.
>
> I also tried editing a system header (/usr/include/stdio.h), by adding a
> new #define to it. SCons did not attempt to rebuild targets after this
> edit. I verified that the file was in fact included by adding an
> intentional error to stdio.h and manually rebuilding a target, which failed
> with a compiler error as expected.
>
> So, unless I'm mistaken in expecting that if SCons would not rebuilt
> targets on the basis of a change to a system header that it would also not
> use the contents of that system header when producing a cache entry, I
> think the CacheDir mechanism cannot differentiate between identical build
> invocations in the same source tree across systems with varying system
> headers or compiler versions.
>
What Decider type are you using? In this case, I imagine that any of the
above changes SHOULD trigger a rebuild unless SCons doesn't track
information on dependencies outside the SConstruct tree (other than
existence). That brings up a set of concerns regarding build consistency.
I don't use CacheDir functionality generally, but I know that under a basic
build structure SCons rebuilds when these external dependencies change.
V/R,
William
On Sat, Oct 25, 2014 at 2:49 PM, Andrew C. Morrow <andrew.c.morrow at gmail.com
> wrote:
>
>
> On Fri, Oct 24, 2014 at 7:13 PM, William Blevins <wblevins001 at gmail.com>
> wrote:
>
>> Andrew,
>>
>> I'm not sure I am qualified to answer this question, but since no one
>> else has attempted, I will try to help the best I can :)
>>
>
> Thanks!
>
>
>>
>> What pieces of information are combined to identify a CacheDir entry?
>>
>>
>> I imagine that all the information written into .sconsign.dblite for a
>> target is used.
>>
>
> OK. That is somewhat helpful as it means one less thing to understand.
> However, it also means that now I have the same question but about the
> sconsign file. Is a complete description of the inputs used to derive an
> entry in .sconsign.dblite described somewhere?
>
>
>>
>> I ask because I'm interested in setting up a CacheDir on a network share,
>>> but it is not clear to me what I need to do to make this work with
>>> heterogeneous client systems.
>>
>>
>> I wonder if CacheDir is thread safe. Can more than 1 SCons instance
>> interact with CacheDir files at a time? That might be the biggest
>> question.
>>
>
> Based on the comment here:
>
>
> https://bitbucket.org/scons/scons/src/cff67baa7ab35fc262bfc08298fe365e537422fc/src/engine/SCons/CacheDir.py?at=default#cl-82
>
> it seems that it is intended to be possible to share CacheDirs between
> concurrent builds. Thinking about it abstractly it seems like it should
> work: if you look for the file in the cache and don't find it, you rebuild
> it. If you look for it and find it, you use it. The only race I can see
> would be if you were to consume an entry that was in the process of being
> written by another process and read an incomplete object file, but it
> appears from a quick look at CacheDir.py that new cache entries are first
> written to a temporary file and only then renamed to the cached name.
> Assuming the underlying FS provides atomic file renames, I think this is
> expected to work.
>
> Some further evidence that this is intended to work correctly comes from
> this comment in the man page:
>
> "The --random option is useful to prevent multiple builds from trying
> to update the cache simultaneously."
>
>
>>
>> As a practical example, consider two otherwise identical linux
>>> workstations, where the only difference is that one has GCC 4.8 and the
>>> other has GCC 4.9, but both installed as /usr/bin/gcc. Would these
>>> workstations share cache entries, assuming identical source files and tool
>>> invocations? Why, or why not? Similar question for two machines with
>>> varying contents of /usr/include/stdio.h or other system C and C++ headers
>>> reached by translation units? For /usr/bin/ld?
>>
>>
>> A SCons target essentially depends on every artifact that goes into a
>> build command which includes the compiler and/or linker. Looking at the
>> dependency tree for an object compiled with gcc will show the gcc
>> executable as a dependency, so the cached objects will (most likely) differ
>> (based on the Decider) since the gcc binaries will not be equivalent. Of
>> course this relates to ALL target traits including compiler, compiler
>> arguments, library paths, include paths, etc.
>>
>
> I don't think it actually includes the hash of the contents of the
> compiler binary. As an experiment, I built my project with
> /opt/local/bin/clang being a symlink to /opt/local/bin/clang-3.5. I then
> updated the symlink to point to /opt/local/bin/clang-3.4. After switching
> the symlink, rebuilding was a no-op.
>
> I also tried editing a system header (/usr/include/stdio.h), by adding a
> new #define to it. SCons did not attempt to rebuild targets after this
> edit. I verified that the file was in fact included by adding an
> intentional error to stdio.h and manually rebuilding a target, which failed
> with a compiler error as expected.
>
> So, unless I'm mistaken in expecting that if SCons would not rebuilt
> targets on the basis of a change to a system header that it would also not
> use the contents of that system header when producing a cache entry, I
> think the CacheDir mechanism cannot differentiate between identical build
> invocations in the same source tree across systems with varying system
> headers or compiler versions.
>
>
>>
>> The system I'm considering would have a few populations of mostly
>>> homogeneous systems, so if necessary I could capture some information from
>>> the external environment (contents of env['CC'] --version, contents of
>>> /etc/lsb_release, etc.) to identify which population a given machine
>>> belongs to. If I had such a machine fingerprint, what would be the best way
>>> to inject it into the SCons view of the world such that cache entries with
>>> different fingerprints were differentiated?
>>
>>
>> In this case, I don't this that the platform will have an effect as long
>> as the tools and dependencies are the same. A different linux OS may have
>> different system headers, etc, which would cause cache misses potentially.
>> Also, I know that the md5sum implementation for linux changed between RHEL5
>> and RHEL6, so that may also cause issues if python uses a 3rd-party library
>> for this? I imagine python uses its own implementation though...
>>
>
> Based on the two experiments above, my current view is that variations
> across systems headers and compiler versions will not be detected, so I
> think some sort of system fingerprint is still required to avoid false
> sharing of cache entries. Given that, do you have any thoughts on the best
> way to get such a system fingerprint (assuming I can construct a correct
> one) to affect every cache entry computation? Essentially, I want to use
> the system fingerprint as a salt for the cache entry hash.
>
>
>>
>> Sorry I couldn't be more helpful,
>> William
>>
>
> I appreciate your taking the time to get the conversation going. Hopefully
> some other people will chime in with more details.
>
> Thanks,
> Andrew
>
>
>
>>
>>
>> On Fri, Oct 24, 2014 at 12:47 PM, Andrew C. Morrow <
>> andrew.c.morrow at gmail.com> wrote:
>>
>>>
>>> Hi All -
>>>
>>> What pieces of information are combined to identify a CacheDir entry?
>>>
>>> I ask because I'm interested in setting up a CacheDir on a network
>>> share, but it is not clear to me what I need to do to make this work with
>>> heterogeneous client systems.
>>>
>>> As a practical example, consider two otherwise identical linux
>>> workstations, where the only difference is that one has GCC 4.8 and the
>>> other has GCC 4.9, but both installed as /usr/bin/gcc. Would these
>>> workstations share cache entries, assuming identical source files and tool
>>> invocations? Why, or why not? Similar question for two machines with
>>> varying contents of /usr/include/stdio.h or other system C and C++ headers
>>> reached by translation units? For /usr/bin/ld?
>>>
>>> The system I'm considering would have a few populations of mostly
>>> homogeneous systems, so if necessary I could capture some information from
>>> the external environment (contents of env['CC'] --version, contents of
>>> /etc/lsb_release, etc.) to identify which population a given machine
>>> belongs to. If I had such a machine fingerprint, what would be the best way
>>> to inject it into the SCons view of the world such that cache entries with
>>> different fingerprints were differentiated?
>>>
>>> One obvious solution would be to have different subdirectories of the
>>> network mount for each client fingerprint, but that seems clunky. Are any
>>> contents of the Environment hashed into the cache id? If not, can I request
>>> that an Environment entry be incorporated, so that I could have
>>> env['MACHINE_FINGERPRINT'] = get_machine_fingerprint() and make that affect
>>> the cache id? Are there other techniques for injecting this sort of
>>> auxilliary information into the cache id for a node?
>>>
>>> Thoughts on the above questions, comments from others who have attempted
>>> this, or pointers to the relevant subsets of the SCons source would be
>>> greatly appreciated.
>>>
>>> Thanks, and apologies for the wall of questions,
>>> Andrew
>>>
>>>
>>> _______________________________________________
>>> Scons-users mailing list
>>> Scons-users at scons.org
>>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>>
>>>
>>
>> _______________________________________________
>> Scons-users mailing list
>> Scons-users at scons.org
>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>
>>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20141025/bdcff901/attachment-0001.html>
More information about the Scons-users
mailing list