[Scons-users] Filename characters outside of latin-1

Plunket, Tom tom.plunket at aristocrat-inc.com
Tue Jan 26 22:32:33 EST 2016


I continue to struggle with non-ASCII characters in filenames in the workflow that I'm developing to replace a legacy system and seeing SCons is now on Bitbucket I thought I'd give it a pull and try to figure out how to fix it for once and for all. I'm still making my way through the getting started info but I also opened up the test\file-names.py file and see a big comment discussing challenges with quoting.

I have previously discovered that the Unicode problem begins with the fact that every filename manipulation includes a 'str(...)' conversion, which only allows 7-bit ASCII characters. I thought about just globally search&replacing 'str(...)' with 'unicode(...)' but didn't want to crack that nut at that time.

Today I've found the test\file-names.py test script. There is a giant comment in it discussing the hassles of quoting for the command line and my instigation for looking into this today was getting tripped up by ampersands (&) in filenames since the shell was interpreting that as a command separator of sorts, which is obviously not what I wanted. (The actual name handling/command line generation was a problem in my code but it got me motivated.)

So my first question is, is there a particular reason to avoid using subprocess? It handles all of the quoting for you and generates processes as appropriate for the platform. You (can) just give it a list of strings where each string is an argument and it just does the Right Thing(tm). It seems as though that would be a lot easier than fighting with shells for command line interpretation. At the very least, is there any demotivation for just using the quoting facility that is (not documented but) a part of that package?

My second question is, has anyone done any work on supporting non-ASCII characters or any investigation into what it would take to do that? I don't believe that the Unicode support work requires any specific knowledge of Unicode beyond the fact that '\uXXYY' is not equivalent to '\xXX\xYY'. (Regardless, I have come to have a pretty deep knowledge of Unicode conversion to and from the various UTF specifications so anything curious that popped up would be understood.)

At the end of the day, anyone using SCons should be able to transparently handle whatever format of filenames their system gives them, be it UTF-8, UTF-16, MBCS, or otherwise.


IMPORTANT CONFIDENTIALITY NOTICE:

This E-mail(including any documents referred to in, or attached, to this E-mail) may contain information that is personal, confidential or the subject of copyright or other proprietary rights in favor of Aristocrat, its affiliates or third parties. This E-mail is intended only for the named addressee. Any privacy, confidence, copyright or other proprietary rights in favor of Aristocrat, its affiliates or third parties, is not lost because this E-mail was sent to you by mistake.

If you received this E-mail by mistake you should: (i) not copy, disclose, distribute or otherwise use it, or its contents, without the consent of Aristocrat or the owner of the relevant rights; (ii) let us know of the mistake by reply E-mail or by telephone (US 1-877-274-9661, or  AU +61 2 9013 6000); and (iii) delete it from your system and destroy all copies.

Any personal information contained in this E-mail must be handled in accordance with applicable privacy laws.

Electronic and internet communications can be interfered with or affected by viruses and other defects. As a result, such communications may not be successfully received or, if received, may cause interference with the integrity of receiving, processing or related systems (including hardware, software and data or information on, or using, that hardware or software). Aristocrat gives no assurances in relation to these matters.

If you have any doubts about the veracity or integrity of any electronic communication we appear to have sent you, please call (US 1-877-274-9661, or  AU +61 2 9013 6000) for clarification.


More information about the Scons-users mailing list