[Scons-users] Intermittent Install() failure

Hill, Steve (FP COM) Steve.Hill at cobham.com
Mon Sep 5 06:43:23 EDT 2016


I did try procmon but I was unable to see the wood for the trees - I ended up with 100,000s of entries even with some filters. Of course, I didn't really know what I was doing with it, which was likely the real problem! However, my suspicion is that we would find that the python process already has the file open and we would have to do this anyway. If you can provide any guidance on using procmon for this sort of thing (or links to guidance), I'm more than happy to run it...

S.

-----Original Message----- 
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Gary Oberbrunner
Sent: 05 September 2016 11:37
To: just.one.man at yandex.ru; SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

I may have missed something here, but is it true you didn't find anything useful from monitoring execution with Sysinternals procmon? If you haven't tried that yet, it's much simpler than what you're proposing.

On Mon, Sep 5, 2016 at 6:07 AM, Vasily <just.one.man at yandex.ru> wrote:
Yes, that should be enough. Also I'd suggest patching "open" as well. 
Thanks, 
Vasily
05 сент. 2016 г. 12:20 пользователь "Hill, Steve (FP COM)" <Steve.Hill at cobham.com> написал:

OK, I've a bit more information. I've reinstated the filecmp.cmp in the decider but with a try/catch BaseException around it. I now see the problem again but the filecmp.cmp is not throwing the exception. This leads me to conclude that there is some side-effect of calling filecmp.cmp that is causing (occasionally) an issue with the SCons code after the decider is invoked but before the copy is performed. From what I can determine, filecmp.cmp uses:

    with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
        <do compare>

so I do not think that it is leaving files open and a Google search didn't yield any reports of it doing so. Hence, I'm not really any wiser as to what the root cause of the problem is. I think that I need to do this:

>> monkey patch built-in file() and open() and log all of them with timestamps

Am I right in thinking that I just need to patch sys.modules["__builtin__"].file with my wrapper to patch this globally?

Cheers,

S.

-----Original Message----- 
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Vasily
Sent: 18 August 2016 11:03
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

Hi Steve,
I can suggest you to monkey patch built-in file() and open() and log all of them with timestamps. That may give you a clue if this file is being held open by normal Python. Next step could be logging callstacks (e.g. via traceback.format_tb() if I'm not mistaken) for offending open() or file() call, this should give you an idea of what part of SCons code is doing the bad thing...
Thanks,
Vasily
18 авг. 2016 г. 10:31 пользователь "Hill, Steve (FP COM)" <Steve.Hill at cobham.com> написал:
OK, I've dug a bit further and have some more information on this. I've monkey patched SCons.Tool.install.copyFunc with a wrapper to peform retries and mapped SCons.Tool.install.shutil.copy2 onto win32api.CopyFile and I can see that copyFunc is *never* called for the failing Install() actions and, when it is called, it always seems to succeed at the first try.

This leads me to conclude that it is the dependency checking - or some part of it like the MD5 calculation - that is where the problem is occuring. Investigations continue; suggests/observations gratefully received...

S.

-----Original Message-----
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Hill, Steve (FP COM)
Sent: 22 July 2016 08:42
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

* PGP Bad Signature, Signed: 07/22/2016 at 03:41:30 AM

Hi Andrew,

This could certainly be related, although in our case it is usually a leaf file (i.e. a file that has no following actions) that exhibits the problem. However, like you I am only aware of having seen this with parallel builds - though that is the default so it may just be that we only see it doing what we do most often.

I'm still wondering whether this is some sort of interaction with an external process - like a virus checker, Windows search indexing, etc. - since the Install() does a shutil.copy2 to copy the file followed by a os.chmod on the same file. When Googling, I noticed that Mozilla have a function to wrap various file operations for Windows that retries a number of times with a sleep in between (https://hg.mozilla.org/mozilla-central/file/e7e69cc8c07b/testing/mozbase/mozfile/mozfile/mozfile.py#l157). I might try patching env['INSTALL'] to do something similar and see if that works around the problem...

Cheers,

S.

-----Original Message-----

We have also had an issue, so far not entirely understood, in the MongoDB Windows CI build where we get a similar error during a highly parallel build. The situation there is:

- an env.Program call generates the file <build>/mongo/mongod.exe
- A hand rolled MSI generator is invoked, which accesses <build>/mongo/mongod.exe from the program 'light.exe'
- An env.Install call copies <build>/mongo/mongod.exe to <install>/mongod.exe

What we find is that sometimes (rarely), the call to light.exe will fail with an error:

[2016/06/13 17:58:16.347] light.exe : error LGHT0001 : The process cannot access the file 'C:\data\mci\e0c4bbc6610beafed74ddcea6eb880bb\src\build\win32\mongo\mongod.exe'
because it is being used by another process.

Our belief is that this error occurs when the light.exe execution runs concurrently with the env.Install task, so that both are trying to concurrently read from <build>/mongo/mongod.exe. I would normally expect that to be OK, as long as both light.exe and the Install rule were opening the file only for read, but that is based in UNIX experience.

I can see how this situation could possibly lead to an error if light.exe were opening <build>/mongo/mongod.exe for write, or with some sort of exclusive lock, and that is one avenue we plan to explore (I hope it isn't doing that though - why would it need to?). I guess the other hypothesis is that the env.Install action is somehow opening the file for write or with some sort of file locking? Is that a possibility?

Thanks,
Andrew


On Wed, Jul 20, 2016 at 10:19 AM, Hill, Steve (FP COM) <Steve.Hill at cobham.com> wrote:
> No, just using env.Install(); it is that operation that fails – but
> will usually succeed if retried…
>
>
>
>
>
>
>
> Copying via what means? shutils?
>
>
>
> On Wed, Jul 20, 2016 at 9:18 AM, Hill, Steve (FP COM)
> <Steve.Hill at cobham.com> wrote:
>
> We do have builders like this, though I hope that they all use the
> ‘with file(“file”, “w”) as f:…’ construct. It is possible that one has
> slipped in without that – but not on the files that are causing this
> problem as some of them are literally copying a file under source
> control to another location…
>
>
>
> S.
>
>
>
>
>
>
>
> William,
>
> Indeed..
>
> -Bill
>
>
>
> On Tue, Jul 19, 2016 at 1:48 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Bill,
>
> I assume you think this may be a problem with not explicitly closing
> the file? Relying on python garbage collection in this case would
> certainly be dangerous.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 6:22 PM, Bill Deegan
> <bill at baddogconsulting.com>
> wrote:
>
> Steve,
>
> Do you have any python logic creating the files in the SCons process?
> (a builder where the builder actuall does open('file','w').. to create
> the file?
>
> -Bill
>
>
>
> On Tue, Jul 19, 2016 at 7:55 AM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Steve,
>
> As a final thought, I have used SCons to build HT 24-core server
> blades (so
> 48 w/ HT) that RAM boot (no harddrive at RT). If it was specific to
> the core package, I imagine I would have seen this issue tons.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 12:52 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Vasily,
>
> I think you are referring to Precious.
>
> V/R,
>
> William
>
>
>
> On Tue, Jul 19, 2016 at 9:51 AM, Vasily <just.one.man at yandex.ru> wrote:
>
> Hi Steve,
>
> What does your build do with this file except installing it? Is it
> used by a compiler or some other tool?
>
> Also, if I'm not mistaken, default SCons behavior is to remove the
> target before performing any action to regenerate it, so this might be
> the source of the exception you're seeing. There seems to be a way to
> turn off the removal part of the action, so you may want to check the
> manual and see if it helps you.
>
> P.S. Based on file path I assume you're working on Windows, is this
> path a network one?
>
> Thanks,
> Vasily
>
> 19 июля 2016 г. 11:02 пользователь "Hill, Steve (FP COM)"
> <Steve.Hill at cobham.com> написал:
>
>
>
> Thanks William.
>
>
>
> I’ve checked the dependency tree and, as far as I can tell, it looks OK.
>
> We do have custom scanners – using env.Scanner(_scan_domain_header) –
> but not for the files that are affected.
>
> As far as I can see, the fix for that Python bug never made it into
> 2.6.x but it appears always to result in a “No child processes”
> OSError so I don’t believe that is what I am seeing.
>
>
>
> Note that I have (temporarily) changed the decider below to simply
> return True (without doing anything with the file) and I still see the
> issue so the decider doesn’t seem to be relevant to the issue.
>
>
>
> Does anyone have any more ideas? This is becoming a major issue for
> our automated builds…
>
>
>
> S.
>
>
>
>
>
> Steve,
>
> I of course should have asked the obvious question, do you know if you
> dependency tree has missing dependencies? This tends to be a common issue.
>
> V/R,
>
> William
>
>
>
> On Fri, Jul 15, 2016 at 5:14 PM, William Blevins
> <wblevins001 at gmail.com>
> wrote:
>
> Steve,
>
> I'm not aware of any specific issue with install, but there are some
> possible issues that I am aware:
>
> If you have custom scanners, make sure they implement from
> SCons.Scanner.Current and not SCons.Scanner.Base; otherwise, you might
> have concurrent file access between implicit scanning operations and
> other processes.
> There was a big subprocess bug in python 2.6 that carried through
> several other major versions: https://bugs.python.org/issue1731717. I
> would check to see that your version of 2.6 contains the patch for this issue.
>
> V/R,
>
> William
>
>
>
> On Fri, Jul 15, 2016 at 9:30 AM, Hill, Steve (FP COM)
> <Steve.Hill at cobham.com> wrote:
>
> I’m having a problem where, with parallel builds (most people use 8,
> 12 or
> 16 threads), we occasionally get failures like the following:
>
>
>
> F:\<directory path>\hw_cfgs\1Server_1TM_6C66_2U.cfg: The process
> cannot access the file because it is being used by another process
> scons: building terminated because of errors.
>
>
>
> We are running Python 2.6.5 (with pywin32) and SCons 2.3.6. This file
> is being copied due to a Install() call. Note that, for various
> historical reasons, we have the following decider for these Installs:
>
>
>
> def _copy_decider(dependency, target, prev_ni):
>
>     target = str(target.abspath)
>
>     dependency = str(dependency.abspath)
>
>     if os.path.isfile(target) and os.path.isfile(dependency):
>
>         # By default, filecmp.cmp assumes that files with identical
> os.stat signatures
>
>         # (which includes the inode) are the same file and, hence,
> must be the same.
>
>         # However, on Windows, there is no inode - it appears to be
> set to zero - so
>
>         # any two files with the same size and
> access/creation/modification times
>
>         # will have the same os.stat signature, leading to a false positive.
> For this
>
>         # reason, we must force it to do an actual file comparison by
> setting shallow
>
>         # to False
>
>         return not filecmp.cmp(target, dependency, shallow = False)
>
>     else:
>
>         # Either one of the dependency or target isn't a file or one
> of the files
>
>         # (presumably the target) isn't there so do the copy
>
>         return True
>
>
>
> Note also that our IT department claims that virus checkers are
> disabled within the directory where the build is being performed (and
> we certainly have not seen any indication in the virus checker console
> to suggest that to be incorrect).
>
>
>
> Does anyone have any thoughts as to what the problem might be?
>
>
>
> Thanks in advance,
>
>
>
> S.
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
>
>
>
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users
>
_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users

* Hill, Steve <Steve.Hill at cobham.com>
* 0xFA34C3FC

_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users

_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users

_______________________________________________
Scons-users mailing list
Scons-users at scons.org
https://pairlist4.pair.net/mailman/listinfo/scons-users




-- 
Gary
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 526 bytes
Desc: not available
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160905/e3fd2b61/attachment.pgp>


More information about the Scons-users mailing list