[Scons-users] CacheDir race during parallel Windows builds?

William Blevins wblevins001 at gmail.com
Sat Aug 6 19:06:42 EDT 2016


Andrew,

I'm not sure honestly. At this point, it sounds like you need to be
debugging file handle interactions since we aren't even sure what processes
are causing the problem.

I've never had to do anything like that on windows. Perhaps something like
this would be useful:
http://serverfault.com/questions/1966/how-do-you-find-what-process-is-holding-a-file-open-in-windows

V/R,
William

On Sat, Aug 6, 2016 at 11:50 PM, Andrew C. Morrow <andrew.c.morrow at gmail.com
> wrote:

>
> Hi William -
>
> Thanks for the suggestion. I picked the relevant changes from that pull
> request to into the MongoDB vendored copy of SCons:
>
> $ git diff
> diff --git a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
> b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
> index 3ce481b..0b980a8 100644
> --- a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
> +++ b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Node/__init__.py
> @@ -210,7 +210,8 @@ def get_contents_file(node):
>          return ''
>      fname = node.rfile().get_abspath()
>      try:
> -        contents = open(fname, "rb").read()
> +        with open(fname, "rb") as fp:
> +            contents = fp.read()
>      except EnvironmentError, e:
>          if not e.filename:
>              e.filename = fname
> diff --git a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
> b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
> index 4c61187..57c8b99 100644
> --- a/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
> +++ b/src/third_party/scons-2.5.0/scons-local-2.5.0/SCons/Scanner/C.py
> @@ -58,12 +58,11 @@ class SConsCPPScanner(SCons.cpp.PreProcessor):
>          return result
>      def read_file(self, file):
>          try:
> -            fp = open(str(file.rfile()))
> +            with open(str(file.rfile())) as fp:
> +                return fp.read()
>          except EnvironmentError, e:
>              self.missing.append((file, self.current_file))
>              return ''
> -        else:
> -            return fp.read()
>
>  def dictify_CPPDEFINES(env):
>      cppdefines = env.get('CPPDEFINES', {})
>
> However, the same error occurs after this change, so I think there is
> something else going on here. Whether that is a bug in our SCons setup,
> SCons itself, Windows and/or NTFS, or somehow AWS, I have no idea. I can
> consistently reproduce this issue on an AWS instance I have set up, and we
> would really like CacheDir to work. Is there some additional information I
> can provide to help debug this?
>
> Thanks,
> Andrew
>
>
> On Thu, Aug 4, 2016 at 4:12 PM, William Blevins <wblevins001 at gmail.com>
> wrote:
>
>> Andrew,
>>
>> I haven't gone through the links in detail, but something that *might* be
>> related: https://bitbucket.org/scons/scons/pull-requests/347/avoid-us
>> ing-__slots__-on-node-and-executor/diff
>>
>> This above link is to a recent patch that caught several cases of files
>> being opened without using the "with <file> as <name>" construct to
>> explicitly close files after use in SCons/Node/__init__.py and
>> SCons/Scanner/c.py This might cause problems with timely file handle
>> cleanups (especially on Windows which tends to do some odd file buffering
>> IMHO). You may want to clone the latest and see if that makes a difference.
>> Ideally, the latest is functional with all the 2->3 code changes. Or
>> consider just doing a direct monkey patch for brevity sake.
>>
>> Hope that helps,
>> William
>>
>> On Thu, Aug 4, 2016 at 3:20 PM, Andrew C. Morrow <
>> andrew.c.morrow at gmail.com> wrote:
>>
>>>
>>> Hi -
>>>
>>> At MongoDB, we recently started using CacheDir in our CI system. This
>>> has been a big success for reducing rebuild times for our Linux builds,
>>> however, we were surprised to find that our Windows builds started failing
>>> in a very alarming way:
>>>
>>> Please see the following log file: https://evergreen.mongod
>>> b.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug
>>> _compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_
>>> 20_24_46/0?type=T
>>> <https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug_compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_20_24_46/0?type=T>
>>>
>>> The log lines of interest are:
>>>
>>> [2016/08/02 17:31:09.642] Retrieved `build\cached\mongo\base\data_type_terminated_test.obj'
>>> from cache
>>>
>>> Here, we see that we retrieved this .obj file from the cache. Nine
>>> seconds later, we try to use that object in a link step:
>>>
>>> [2016/08/02 17:31:18.921] link /nologo /DEBUG /INCREMENTAL:NO
>>> /LARGEADDRESSAWARE /OPT:REF /OUT:build\cached\mongo\base\base_test.exe
>>> build\cached\mongo\base\data_range.obj ...
>>>   build\cached\mongo\base\data_type_terminated_test.obj ...
>>>
>>> The link fails, claiming that the data_type_terminated_test.obj file
>>> cannot be opened:
>>>
>>> [2016/08/02 17:31:20.363] LINK : fatal error LNK1104: cannot open file
>>> 'build\cached\mongo\base\data_type_terminated_test.obj'
>>>  [2016/08/02 17:31:20.506] scons: *** [build\cached\mongo\base\base_test.exe]
>>> Error 1104
>>>
>>> We are using a vendored copy of SCons 2.5.0. The only modification is
>>> this:
>>>
>>> https://github.com/mongodb/mongo/commit/bc7e4e6821639ee766ad
>>> a83483975668af98f367#diff-cc7aec1739634ca2a857a4d4227663aa
>>>
>>> This change was made so that the atime of files in the cache is
>>> fine-grained accurate, even if the underlying filesystem is mounted noatime
>>> or relatime, so that we can prune the cache based on access time. We would
>>> like to propose this change to be upstreamed, but that is a separate email.
>>>
>>> SCons was invoked as follows from within an SSH session into cygwin (you
>>> can see it at the top of the build log as well):
>>>
>>> python ./buildscripts/scons.py --dbg=on --opt=on
>>> --win-version-min=ws08r2 -j$(( $(grep -c ^processor /proc/cpuinfo) / 2 ))
>>> MONGO_DISTMOD=2008plus --cache --cache-dir='z:\data\scons-cac
>>> he\9d73adcd-19eb-46f2-9988-b8594ba5a3d1' --use-new-tools all dist
>>> dist-debugsymbols distsrc-zip  MONGO_VERSION=3.3.10-250-g81185a5
>>>
>>> The 'python' here is Windows python, not cygwin, and PyWin32 is
>>> installed.
>>>
>>> The system on which this build ran is running Windows 2012 on a
>>> dedicated spot AWS c3.4xlarge instance, and the toolchain is Visual
>>> Studio 2015 The Z drive, where the cache directory is located, is locally
>>> connected NTFS via AWS ephemeral/instance storage.
>>>
>>> We have since backed out using the Cache on our Windows builds, which is
>>> disappointing - Windows builds take forever compared to others, and we were
>>> really hoping that CacheDir would be a big help here.
>>>
>>> Has anyone seen anything like this, or has some ideas what may be going
>>> wrong here? I know there have been some other recent threads about problems
>>> with Windows and build ordering, but this seems different - the retrieval
>>> of the file from the Cache was correctly ordered, but it doesn't appear to
>>> have been effective.
>>>
>>> I'm happy to provide any additional information if it will help us get
>>> Windows CacheDir enabled builds working.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>
>>> _______________________________________________
>>> Scons-users mailing list
>>> Scons-users at scons.org
>>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>>
>>>
>>
>> _______________________________________________
>> Scons-users mailing list
>> Scons-users at scons.org
>> https://pairlist4.pair.net/mailman/listinfo/scons-users
>>
>>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160807/d4499845/attachment-0001.html>


More information about the Scons-users mailing list