[Scons-users] CacheDir race during parallel Windows builds?
Gary Oberbrunner
garyo at oberbrunner.com
Sat Aug 6 20:29:29 EDT 2016
Sysinternals procmon should help:
https://technet.microsoft.com/en-us/sysinternals/processmonitor
On Sat, Aug 6, 2016 at 7:06 PM, William Blevins <wblevins001 at gmail.com>
wrote:
> Andrew,
>
> I'm not sure honestly. At this point, it sounds like you need to be
> debugging file handle interactions since we aren't even sure what processes
> are causing the problem.
>
> I've never had to do anything like that on windows. Perhaps something like
> this would be useful: http://serverfault.com/questions/1966/how-do-you-
> find-what-process-is-holding-a-file-open-in-windows
>
> V/R,
> William
>
> On Sat, Aug 6, 2016 at 11:50 PM, Andrew C. Morrow <
> andrew.c.morrow at gmail.com> wrote:
>
>>
>> Hi William -
>>
>> Thanks for the suggestion. I picked the relevant changes from that pull
>> request to into the MongoDB vendored copy of SCons:
>>
>> $ git diff
>> diff --git a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
>> b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
>> index 3ce481b..0b980a8 100644
>> --- a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/_
>> _init__.py
>> +++ b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/_
>> _init__.py
>> @@ -210,7 +210,8 @@ def get_contents_file(node):
>> return ''
>> fname = node.rfile().get_abspath()
>> try:
>> - contents = open(fname, "rb").read()
>> + with open(fname, "rb") as fp:
>> + contents = fp.read()
>> except EnvironmentError, e:
>> if not e.filename:
>> e.filename = fname
>> diff --git a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
>> b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
>> index 4c61187..57c8b99 100644
>> --- a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
>> +++ b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
>> @@ -58,12 +58,11 @@ class SConsCPPScanner(SCons.cpp.PreProcessor):
>> return result
>> def read_file(self, file):
>> try:
>> - fp = open(str(file.rfile()))
>> + with open(str(file.rfile())) as fp:
>> + return fp.read()
>> except EnvironmentError, e:
>> self.missing.append((file, self.current_file))
>> return ''
>> - else:
>> - return fp.read()
>>
>> def dictify_CPPDEFINES(env):
>> cppdefines = env.get('CPPDEFINES', {})
>>
>> However, the same error occurs after this change, so I think there is
>> something else going on here. Whether that is a bug in our SCons setup,
>> SCons itself, Windows and/or NTFS, or somehow AWS, I have no idea. I can
>> consistently reproduce this issue on an AWS instance I have set up, and we
>> would really like CacheDir to work. Is there some additional information I
>> can provide to help debug this?
>>
>> Thanks,
>> Andrew
>>
>>
>> On Thu, Aug 4, 2016 at 4:12 PM, William Blevins <wblevins001 at gmail.com>
>> wrote:
>>
>>> Andrew,
>>>
>>> I haven't gone through the links in detail, but something that *might*
>>> be related: https://bitbucket.org/scons/scons/pull-requests/347/avoid-us
>>> ing-__slots__-on-node-and-executor/diff
>>>
>>> This above link is to a recent patch that caught several cases of files
>>> being opened without using the "with <file> as <name>" construct to
>>> explicitly close files after use in SCons/Node/__init__.py and
>>> SCons/Scanner/c.py This might cause problems with timely file handle
>>> cleanups (especially on Windows which tends to do some odd file buffering
>>> IMHO). You may want to clone the latest and see if that makes a difference.
>>> Ideally, the latest is functional with all the 2->3 code changes. Or
>>> consider just doing a direct monkey patch for brevity sake.
>>>
>>> Hope that helps,
>>> William
>>>
>>> On Thu, Aug 4, 2016 at 3:20 PM, Andrew C. Morrow <
>>> andrew.c.morrow at gmail.com> wrote:
>>>
>>>>
>>>> Hi -
>>>>
>>>> At MongoDB, we recently started using CacheDir in our CI system. This
>>>> has been a big success for reducing rebuild times for our Linux builds,
>>>> however, we were surprised to find that our Windows builds started failing
>>>> in a very alarming way:
>>>>
>>>> Please see the following log file: https://evergreen.mongod
>>>> b.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug
>>>> _compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_2
>>>> 0_24_46/0?type=T
>>>> <https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug_compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_20_24_46/0?type=T>
>>>>
>>>> The log lines of interest are:
>>>>
>>>> [2016/08/02 17:31:09.642] Retrieved `build\cached\mongo\base\data_type_terminated_test.obj'
>>>> from cache
>>>>
>>>> Here, we see that we retrieved this .obj file from the cache. Nine
>>>> seconds later, we try to use that object in a link step:
>>>>
>>>> [2016/08/02 17:31:18.921] link /nologo /DEBUG /INCREMENTAL:NO
>>>> /LARGEADDRESSAWARE /OPT:REF /OUT:build\cached\mongo\base\base_test.exe
>>>> build\cached\mongo\base\data_range.obj ...
>>>> build\cached\mongo\base\data_type_terminated_test.obj ...
>>>>
>>>> The link fails, claiming that the data_type_terminated_test.obj file
>>>> cannot be opened:
>>>>
>>>> [2016/08/02 17:31:20.363] LINK : fatal error LNK1104: cannot open file
>>>> 'build\cached\mongo\base\data_type_terminated_test.obj'
>>>> [2016/08/02 17:31:20.506] scons: *** [build\cached\mongo\base\base_test.exe]
>>>> Error 1104
>>>>
>>>> We are using a vendored copy of SCons 2.5.0. The only modification is
>>>> this:
>>>>
>>>> https://github.com/mongodb/mongo/commit/bc7e4e6821639ee766ad
>>>> a83483975668af98f367#diff-cc7aec1739634ca2a857a4d4227663aa
>>>>
>>>> This change was made so that the atime of files in the cache is
>>>> fine-grained accurate, even if the underlying filesystem is mounted noatime
>>>> or relatime, so that we can prune the cache based on access time. We would
>>>> like to propose this change to be upstreamed, but that is a separate email.
>>>>
>>>> SCons was invoked as follows from within an SSH session into cygwin
>>>> (you can see it at the top of the build log as well):
>>>>
>>>> python ./buildscripts/scons.py --dbg=on --opt=on
>>>> --win-version-min=ws08r2 -j$(( $(grep -c ^processor /proc/cpuinfo) / 2 ))
>>>> MONGO_DISTMOD=2008plus --cache --cache-dir='z:\data\scons-cac
>>>> he\9d73adcd-19eb-46f2-9988-b8594ba5a3d1' --use-new-tools all dist
>>>> dist-debugsymbols distsrc-zip MONGO_VERSION=3.3.10-250-g81185a5
>>>>
>>>> The 'python' here is Windows python, not cygwin, and PyWin32 is
>>>> installed.
>>>>
>>>> The system on which this build ran is running Windows 2012 on a
>>>> dedicated spot AWS c3.4xlarge instance, and the toolchain is Visual
>>>> Studio 2015 The Z drive, where the cache directory is located, is locally
>>>> connected NTFS via AWS ephemeral/instance storage.
>>>>
>>>> We have since backed out using the Cache on our Windows builds, which
>>>> is disappointing - Windows builds take forever compared to others, and we
>>>> were really hoping that CacheDir would be a big help here.
>>>>
>>>> Has anyone seen anything like this, or has some ideas what may be going
>>>> wrong here? I know there have been some other recent threads about problems
>>>> with Windows and build ordering, but this seems different - the retrieval
>>>> of the file from the Cache was correctly ordered, but it doesn't appear to
>>>> have been effective.
>>>>
>>>> I'm happy to provide any additional information if it will help us get
>>>> Windows CacheDir enabled builds working.
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Scons-users mailing list
>>>> Scons-users at scons.org
>>>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Scons-users mailing list
>>> Scons-users at scons.org
>>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>>
>>>
>>
>> _______________________________________________
>> Scons-users mailing list
>> Scons-users at scons.org
>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>
>>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
--
Gary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160806/50c8fb24/attachment.html>
More information about the Scons-users
mailing list