[Scons-users] Intermittent Install() failure

Hill, Steve (FP COM) Steve.Hill at cobham.com
Wed Sep 7 11:30:05 EDT 2016


That is quite possibly true but Ive ruled out virus checkers, windows search indexing and SCM tool interactions as a root cause. Ive had a chat with someone in IT to see what else they are running on our PCs that could be causing it.

 

Interestingly, I just added a patched version of os.unlink that will retry up to 5 times, with a delay of half a second between each time. I then saw a couple of potential failures that succeeded after a few retries but some failed for all 5 retries (so 2.5 seconds!) Ive changed the code to double the sleep duration each time and Ill see if that works

 

Does anyone know how to produce a procmon filter that will capture this specific case  when a file deletion fails with Windows error 32? That would allow me to pinpoint the precise cause of the failure

 

S.

 

 

From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Arvid Rosén
Sent: 07 September 2016 13:37
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

 

I bet another process is reading the file which prevents it from being deleted.

 

TortoiseSVN updating those icons? Or indexing of some sort...

 

Cheers,

Arvid

Get Outlook for iOS <https://aka.ms/o0ukef> 

 

________________________________

From: Scons-users <scons-users-bounces at scons.org> on behalf of Hill, Steve (FP COM) <Steve.Hill at cobham.com>
Sent: Wednesday, September 7, 2016 12:09:52 PM
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure 

 

OK, I've looked at the (Python) source code for filecmp.cmp and the (C) source code for the file class and, as far as I can see, the call into the run-time to close the file will take place as soon as the with statement completes.

I've also spotted that SCons has a --debug=stacktrace option (can't believe that I didn't spot this before...) so I can now see the stacktrace for the failure:

scons: internal stack trace:

  File "C:\Python26\Scripts\..\Lib\site-packages\scons-2.3.6\SCons\Job.py", line 387, in start
    task.prepare()
  File "C:\Python26\Scripts\..\Lib\site-packages\scons-2.3.6\SCons\Script\Main.py", line 173, in prepare
    return SCons.Taskmaster.OutOfDateTask.prepare(self)
  File "C:\Python26\Scripts\..\Lib\site-packages\scons-2.3.6\SCons\Taskmaster.py", line 197, in prepare
    t.prepare()
  File "C:\Python26\Scripts\..\Lib\site-packages\scons-2.3.6\SCons\Node\FS.py", line 2899, in prepare
    self._rmv_existing()
  File "C:\Python26\Scripts\..\Lib\site-packages\scons-2.3.6\SCons\Node\FS.py", line 2882, in _rmv_existing
    raise e

Looking at FS.py, I think that this boils down to an exception being thrown during os.unlink(), used to remove the existing target file before copying the source to the target. This leads me to suspect that the issue here is that, under some circumstances (remember that we've only seen this with parallel builds), the OS can delay actually closing the file until some time after fclose() has returned.

I'm going to try monkey patching os.unlink to see whether retrying will allow me to work around this. Any further thoughts/suggestions gratefully received...

S.

-----Original Message-----
From: Scons-users [mailto:scons-users-bounces at scons.org] On Behalf Of Thomas Berg
Sent: 05 September 2016 13:34
To: SCons users mailing list
Subject: Re: [Scons-users] Intermittent Install() failure

On Mon, Sep 5, 2016 at 11:20 AM, Hill, Steve (FP COM) <Steve.Hill at cobham.com> wrote:
> OK, I've a bit more information. I've reinstated the filecmp.cmp in the decider but with a try/catch BaseException around it. I now see the problem again but the filecmp.cmp is not throwing the exception. This leads me to conclude that there is some side-effect of calling filecmp.cmp that is causing (occasionally) an issue with the SCons code after the decider is invoked but before the copy is performed. From what I can determine, filecmp.cmp uses:
>
>     with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
>         <do compare>
>
> so I do not think that it is leaving files open and a Google search didn't yield any reports of it doing so. Hence, I'm not really any wiser as to what the root cause of the problem is. I think that I need to do this:

In our build we've had some indication that 'with open' can't be trusted to close the file in time. At least that's our main suspicion.
So it would be very interesting if you could test this assumption (that the problem is not with open), and try the win32 implementation here and check whether it helps.

- Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160907/f547205a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 526 bytes
Desc: not available
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20160907/f547205a/attachment.pgp>


More information about the Scons-users mailing list