[Scons-users] SCons 3.0.0.alpha.20170614 available on testpypi

Mon Jul 31 18:17:58 EDT 2017

I just found Bill’s reply to me on the mail archive but I don’t remember receiving them. Sorry about that.

I’m not sure how to write a test case as such. An example file from boost::python that fails is http://www.boost.org/doc/libs/1_64_0/boost/python/detail/dealloc.hpp <http://www.boost.org/doc/libs/1_64_0/boost/python/detail/dealloc.hpp> (although I imagine you can’t copy and paste from that link and would need to get the file from the source distribution). This file has a bad character in the author’s name that can’t be decoded and triggers errors when bytes turn up from get_text_contents(). The test should have a file with that problem and see if the scanner can read it.

— 
Tim Jenness

> On Jul 31, 2017, at 14:57 , Tim Jenness <tjenness at lsst.org> wrote:
> 
> 
>> On Jul 21, 2017, at 15:19 , Tim Jenness <tjenness at lsst.org <mailto:tjenness at lsst.org>> wrote:
>> 
>> Now that I’ve thought about it a bit more I think the underlying problem is in engine/SCons/Node/FS.py around line 2630:
>> 
>>    2608     def get_text_contents(self):
>>    2609         """
>>    2610         This attempts to figure out what the encoding of the text is
>>    2611         based upon the BOM bytes, and then decodes the contents so that
>>    2612         it's a valid python string.
>>    2613         """
>>    2614         contents = self.get_contents()
>>    2615         # The behavior of various decode() methods and functions
>>    2616         # w.r.t. the initial BOM bytes is different for different
>>    2617         # encodings and/or Python versions.  ('utf-8' does not strip
>>    2618         # them, but has a 'utf-8-sig' which does; 'utf-16' seems to
>>    2619         # strip them; etc.)  Just sidestep all the complication by
>>    2620         # explicitly stripping the BOM before we decode().
>>    2621         if contents[:len(codecs.BOM_UTF8)] == codecs.BOM_UTF8:
>>    2622             return contents[len(codecs.BOM_UTF8):].decode('utf-8')
>>    2623         if contents[:len(codecs.BOM_UTF16_LE)] == codecs.BOM_UTF16_LE:
>>    2624             return contents[len(codecs.BOM_UTF16_LE):].decode('utf-16-le')
>>    2625         if contents[:len(codecs.BOM_UTF16_BE)] == codecs.BOM_UTF16_BE:
>>    2626             return contents[len(codecs.BOM_UTF16_BE):].decode('utf-16-be')
>>    2627         try:
>>    2628             return contents.decode()
>>    2629         except (UnicodeDecodeError, AttributeError) as e:
>>    2630             return contents
>> 
>> The problem is that if we fail to convert the bytes to Unicode the method returns the “text” contents in bytes. This breaks the contract of get_text_contents() promising to return a string.
>> 
>> Removing the try block at line 2627 and instead using “return contents.decode(errors=“ignore”)” fixes my boost.python build problem.
>> 
> 
> This problem still exists at https://bitbucket.org/scons/scons/src/cfbc036995c8669e296cc0427655345241a0097e/src/engine/SCons/Node/FS.py?at=default&fileviewer=file-view-default#FS.py-2630 <https://bitbucket.org/scons/scons/src/cfbc036995c8669e296cc0427655345241a0097e/src/engine/SCons/Node/FS.py?at=default&fileviewer=file-view-default#FS.py-2630>
> 
> Should I be discussing this on the dev list? Sorry, but I don’t have time to learn how to do a bitbucket PR using hg. The good news is our 500,000 lines of python and C++ code built fine with my suggested fix in place.
> 
> Patch is at https://github.com/lsst/scons/blob/tickets/DM-8560/patches/0001-always-decode-contents.patch <https://github.com/lsst/scons/blob/tickets/DM-8560/patches/0001-always-decode-contents.patch>
> 
> — 
> Tim Jenness
> 
> _______________________________________________
> Scons-users mailing list
> Scons-users at scons.org
> https://pairlist4.pair.net/mailman/listinfo/scons-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist4.pair.net/pipermail/scons-users/attachments/20170731/5a5efcd7/attachment-0001.html>