[Scons-users] CacheDir race during parallel Windows builds?

Andrew C. Morrow andrew.c.morrow at gmail.com
Thu Aug 4 10:20:14 EDT 2016


Hi -

At MongoDB, we recently started using CacheDir in our CI system. This has
been a big success for reducing rebuild times for our Linux builds,
however, we were surprised to find that our Windows builds started failing
in a very alarming way:

Please see the following log file:
 https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug_compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_20_24_46/0?type=T
<https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_windows_64_2k8_debug_compile_81185a50aeed5b2beed2c0a81b381a482489fdb7_16_08_02_20_24_46/0?type=T>

The log lines of interest are:

[2016/08/02 17:31:09.642] Retrieved
`build\cached\mongo\base\data_type_terminated_test.obj' from cache

Here, we see that we retrieved this .obj file from the cache. Nine seconds
later, we try to use that object in a link step:

[2016/08/02 17:31:18.921] link /nologo /DEBUG /INCREMENTAL:NO
/LARGEADDRESSAWARE /OPT:REF /OUT:build\cached\mongo\base\base_test.exe
build\cached\mongo\base\data_range.obj ...
  build\cached\mongo\base\data_type_terminated_test.obj ...

The link fails, claiming that the data_type_terminated_test.obj file cannot
be opened:

[2016/08/02 17:31:20.363] LINK : fatal error LNK1104: cannot open file
'build\cached\mongo\base\data_type_terminated_test.obj'
 [2016/08/02 17:31:20.506] scons: ***
[build\cached\mongo\base\base_test.exe] Error 1104

We are using a vendored copy of SCons 2.5.0. The only modification is this:

https://github.com/mongodb/mongo/commit/bc7e4e6821639ee766ada83483975668af98f367#diff-cc7aec1739634ca2a857a4d4227663aa

This change was made so that the atime of files in the cache is
fine-grained accurate, even if the underlying filesystem is mounted noatime
or relatime, so that we can prune the cache based on access time. We would
like to propose this change to be upstreamed, but that is a separate email.

SCons was invoked as follows from within an SSH session into cygwin (you
can see it at the top of the build log as well):

python ./buildscripts/scons.py --dbg=on --opt=on --win-version-min=ws08r2
-j$(( $(grep -c ^processor /proc/cpuinfo) / 2 )) MONGO_DISTMOD=2008plus
--cache
--cache-dir='z:\data\scons-cache\9d73adcd-19eb-46f2-9988-b8594ba5a3d1'
--use-new-tools all dist dist-debugsymbols distsrc-zip
 MONGO_VERSION=3.3.10-250-g81185a5

The 'python' here is Windows python, not cygwin, and PyWin32 is installed.

The system on which this build ran is running Windows 2012 on a dedicated
spot AWS c3.4xlarge instance, and the toolchain is Visual Studio 2015 The Z
drive, where the cache directory is located, is locally connected NTFS via
AWS ephemeral/instance storage.

We have since backed out using the Cache on our Windows builds, which is
disappointing - Windows builds take forever compared to others, and we were
really hoping that CacheDir would be a big help here.

Has anyone seen anything like this, or has some ideas what may be going
wrong here? I know there have been some other recent threads about problems
with Windows and build ordering, but this seems different - the retrieval
of the file from the Cache was correctly ordered, but it doesn't appear to
have been effective.

I'm happy to provide any additional information if it will help us get
Windows CacheDir enabled builds working.

Thanks,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160804/8c4d5d05/attachment.html>


More information about the Scons-users mailing list