[Scons-users] Intermittent Install() failure

Hill, Steve (FP COM) Steve.Hill at cobham.com
Thu Aug 18 06:53:08 EDT 2016


Thanks Vasily. I do have a potential lead. It turns out that the Decider that I mentioned initially is actually duplicated (grrr... code duplication) and I had only hacked out one instance. I've now hacked them both out and successfully performed a build. In itself, that doesn't show that the problem has gone so I'll leave it for a few days; if I don't see the issue anymore, I will try looking at the callstack as you suggest. It is possible that the Decider can be called with the node or it's dependency still being open...

Watch this space...

S. 

-----Original Message-----
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Vasily
Sent: 18 August 2016 11:03
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

Hi Steve, 
I can suggest you to monkey patch built-in file() and open() and log all of them with timestamps. That may give you a clue if this file is being held open by normal Python. Next step could be logging callstacks (e.g. via traceback.format_tb() if I'm not mistaken) for offending open() or file() call, this should give you an idea of what part of SCons code is doing the bad thing... 
Thanks, 
Vasily
18 авг. 2016 г. 10:31 пользователь "Hill, Steve (FP COM)" <Steve.Hill at cobham.com> написал:
OK, I've dug a bit further and have some more information on this. I've monkey patched SCons.Tool.install.copyFunc with a wrapper to peform retries and mapped SCons.Tool.install.shutil.copy2 onto win32api.CopyFile and I can see that copyFunc is *never* called for the failing Install() actions and, when it is called, it always seems to succeed at the first try.

This leads me to conclude that it is the dependency checking - or some part of it like the MD5 calculation - that is where the problem is occuring. Investigations continue; suggests/observations gratefully received...

S.

-----Original Message-----
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Hill, Steve (FP COM)
Sent: 22 July 2016 08:42
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

* PGP Bad Signature, Signed: 07/22/2016 at 03:41:30 AM

Hi Andrew,

This could certainly be related, although in our case it is usually a leaf file (i.e. a file that has no following actions) that exhibits the problem. However, like you I am only aware of having seen this with parallel builds - though that is the default so it may just be that we only see it doing what we do most often.

I'm still wondering whether this is some sort of interaction with an external process - like a virus checker, Windows search indexing, etc. - since the Install() does a shutil.copy2 to copy the file followed by a os.chmod on the same file. When Googling, I noticed that Mozilla have a function to wrap various file operations for Windows that retries a number of times with a sleep in between (https://hg.mozilla.org/mozilla-central/file/e7e69cc8c07b/testing/mozbase/mozfile/mozfile/mozfile.py#l157). I might try patching env['INSTALL'] to do something similar and see if that works around the problem...

Cheers,

S.

-----Original Message-----

We have also had an issue, so far not entirely understood, in the MongoDB Windows CI build where we get a similar error during a highly parallel build. The situation there is:

- an env.Program call generates the file <build>/mongo/mongod.exe
- A hand rolled MSI generator is invoked, which accesses <build>/mongo/mongod.exe from the program 'light.exe'
- An env.Install call copies <build>/mongo/mongod.exe to <install>/mongod.exe

What we find is that sometimes (rarely), the call to light.exe will fail with an error:

[2016/06/13 17:58:16.347] light.exe : error LGHT0001 : The process cannot access the file 'C:\data\mci\e0c4bbc6610beafed74ddcea6eb880bb\src\build\win32\mongo\mongod.exe'
because it is being used by another process.

Our belief is that this error occurs when the light.exe execution runs concurrently with the env.Install task, so that both are trying to concurrently read from <build>/mongo/mongod.exe. I would normally expect that to be OK, as long as both light.exe and the Install rule were opening the file only for read, but that is based in UNIX experience.

I can see how this situation could possibly lead to an error if light.exe were opening <build>/mongo/mongod.exe for write, or with some sort of exclusive lock, and that is one avenue we plan to explore (I hope it isn't doing that though - why would it need to?). I guess the other hypothesis is that the env.Install action is somehow opening the file for write or with some sort of file locking? Is that a possibility?

Thanks,
Andrew


On Wed, Jul 20, 2016 at 10:19 AM, Hill, Steve (FP COM) <Steve.Hill at cobham.com> wrote:
> No, just using env.Install(); it is that operation that fails – but
> will usually succeed if retried…
>
>
>
>
>
>
>
> Copying via what means? shutils?
>
>
>
> On Wed, Jul 20, 2016 at 9:18 AM, Hill, Steve (FP COM)
> <Steve.Hill at cobham.com> wrote:
>
> We do have builders like this, though I hope that they all use the
> ‘with file(“file”, “w”) as f:…’ construct. It is possible that one has
> slipped in without that – but not on the files that are causing this
> problem as some of them are literally copying a file under source
> control to another location…
>
>
>
> S.
>
>
>
>
>
>
>
> William,
>
> Indeed..
>
> -Bill
>
>
>
> On Tue, Jul 19, 2016 at 1:48 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Bill,
>
> I assume you think this may be a problem with not explicitly closing
> the file? Relying on python garbage collection in this case would
> certainly be dangerous.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 6:22 PM, Bill Deegan
> <bill at baddogconsulting.com>
> wrote:
>
> Steve,
>
> Do you have any python logic creating the files in the SCons process?
> (a builder where the builder actuall does open('file','w').. to create
> the file?
>
> -Bill
>
>
>
> On Tue, Jul 19, 2016 at 7:55 AM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Steve,
>
> As a final thought, I have used SCons to build HT 24-core server
> blades (so
> 48 w/ HT) that RAM boot (no harddrive at RT). If it was specific to
> the core package, I imagine I would have seen this issue tons.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 12:52 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Vasily,
>
> I think you are referring to Precious.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 9:51 AM, Vasily <just.one.man at yandex.ru> wrote:
>
> Hi Steve,
>
> What does your build do with this file except installing it? Is it
> used by a compiler or some other tool?
>
> Also, if I'm not mistaken, default SCons behavior is to remove the
> target before performing any action to regenerate it, so this might be
> the source of the exception you're seeing. There seems to be a way to
> turn off the removal part of the action, so you may want to check the
> manual and see if it helps you.
>
> P.S. Based on file path I assume you're working on Windows, is this
> path a network one?
>
> Thanks,
> Vasily
>
> 19 июля 2016 г. 11:02 пользователь "Hill, Steve (FP COM)"
> <Steve.Hill at cobham.com> написал:
>
>
>
> Thanks William.
>
>
>
> I’ve checked the dependency tree and, as far as I can tell, it looks OK.
>
> We do have custom scanners – using env.Scanner(_scan_domain_header) –
> but not for the files that are affected.
>
> As far as I can see, the fix for that Python bug never made it into
> 2.6.x but it appears always to result in a “No child processes”
> OSError so I don’t believe that is what I am seeing.
>
>
>
> Note that I have (temporarily) changed the decider below to simply
> return True (without doing anything with the file) and I still see the
> issue so the decider doesn’t seem to be relevant to the issue.
>
>
>
> Does anyone have any more ideas? This is becoming a major issue for
> our automated builds…
>
>
>
> S.
>
>
>
>
>
> Steve,
>
> I of course should have asked the obvious question, do you know if you
> dependency tree has missing dependencies? This tends to be a common issue.
>
> V/R,
>
> William
>
>
>
> On Fri, Jul 15, 2016 at 5:14 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Steve,
>
> I'm not aware of any specific issue with install, but there are some
> possible issues that I am aware:
>
> If you have custom scanners, make sure they implement from
> SCons.Scanner.Current and not SCons.Scanner.Base; otherwise, you might
> have concurrent file access between implicit scanning operations and
> other processes.
> There was a big subprocess bug in python 2.6 that carried through
> several other major versions: https://bugs.python.org/issue1731717. I
> would check to see that your version of 2.6 contains the patch for this issue.
>
> V/R,
>
> William
>
>
>
> On Fri, Jul 15, 2016 at 9:30 AM, Hill, Steve (FP COM)
> <Steve.Hill at cobham.com> wrote:
>
> I’m having a problem where, with parallel builds (most people use 8,
> 12 or
> 16 threads), we occasionally get failures like the following:
>
>
>
> F:\<directory path>\hw_cfgs\1Server_1TM_6C66_2U.cfg: The process
> cannot access the file because it is being used by another process
> scons: building terminated because of errors.
>
>
>
> We are running Python 2.6.5 (with pywin32) and SCons 2.3.6. This file
> is being copied due to a Install() call. Note that, for various
> historical reasons, we have the following decider for these Installs:
>
>
>
> def _copy_decider(dependency, target, prev_ni):
>
>     target = str(target.abspath)
>
>     dependency = str(dependency.abspath)
>
>     if os.path.isfile(target) and os.path.isfile(dependency):
>
>         # By default, filecmp.cmp assumes that files with identical
> os.stat signatures
>
>         # (which includes the inode) are the same file and, hence,
> must be the same.
>
>         # However, on Windows, there is no inode - it appears to be
> set to zero - so
>
>         # any two files with the same size and
> access/creation/modification times
>
>         # will have the same os.stat signature, leading to a false positive.
> For this
>
>         # reason, we must force it to do an actual file comparison by
> setting shallow
>
>         # to False
>
>         return not filecmp.cmp(target, dependency, shallow = False)
>
>     else:
>
>         # Either one of the dependency or target isn't a file or one
> of the files
>
>         # (presumably the target) isn't there so do the copy
>
>         return True
>
>
>
> Note also that our IT department claims that virus checkers are
> disabled within the directory where the build is being performed (and
> we certainly have not seen any indication in the virus checker console
> to suggest that to be incorrect).
>
>
>
> Does anyone have any thoughts as to what the problem might be?
>
>
>
> Thanks in advance,
>
>
>
> S.
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users

* Hill, Steve <Steve.Hill at cobham.com>
* 0xFA34C3FC

_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 526 bytes
Desc: not available
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160818/1ae19fc9/attachment.pgp>


More information about the Scons-users mailing list