|
Post by w3wilkes on Feb 27, 2020 12:41:50 GMT -5
As an old retired IBM mainframe sysprog I love using SPFLite on my PC for some types of editing as I’ve found nothing else that can do these type things on a PC! I do have one little problem though, it’s the inability to disable BOM for UTF8 files. In reading the Unicode Standard; www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdfIn the last paragraph on page 40 it states “Use of a BOM is neither required nor recommended for UTF-8”. I’ve found that I use some applications on my PC that will bypass the first record in a file if it is prefixed with a UTF8 BOM after editing the file with SPFLite. It would be nice if SPFLite gave users the option to shut off the inserting of BOM at the start of a UTF8 encoded file, especially if they didn't have a BOM prefix prior to editing with SPFLite. Perhaps in the “File Data Options” add for a “UTF8 –BOM” for the “Encoding used”.
|
|
|
Post by George on Feb 27, 2020 14:19:54 GMT -5
Got this via email earlier. A new option for Profile has been added - BOM [ON | OFF] to control whether the writing of the BOM should occur. No change was made to the file loading, SPFLite will always check for and honor any BOM that is present.
Testing now, if OK this will be in the next release.
George
|
|
|
Post by George on Feb 27, 2020 16:17:50 GMT -5
Robert: I don't see any easy way to handle people who want to look at UTF files in 'raw' mode. The file loading routine is pretty hairy as it is, as it tries to handle all the different combinations of EOL, SOURCE, RECFM, LRECL etc.
This option is really for people like the current user, who know what they want. The default of BOM ON, will continue to 'do the right thing' for the majority of users.
Boy! reading that file reading routine, it really is a complicated 'don't touch' routine.
George
|
|
|
Post by w3wilkes on Feb 28, 2020 0:43:04 GMT -5
Just make it so that "BOM OFF" can only be set for UTF8 encoded files. This way you're well within the Unicode standard of “Use of a BOM is neither required nor recommended for UTF-8”. As it is SPFLite has no trouble opening a UTF8 file regardless of whether it has BOM prefix or not. What SPFLite does do is not give us a choice in saving UTF8 without BOM like the standard recommends for UTF8.
|
|
|
Post by George on Feb 28, 2020 12:41:27 GMT -5
w3wilkes: I'll ensure BOM OFF is only for UTF8.
Robert: As to handling 'weird' files (Binary etc.) I'm afraid my position is going to be that SPFLite is a TEXT editor, and ANY use of it with non-text files is "if it works - great!", if it doesn't work, or messes up your file "look for a binary editor, there's lots".
George
|
|
|
Post by ManfredU on Sept 3, 2020 6:26:59 GMT -5
... As it is SPFLite has no trouble opening a UTF8 file regardless of whether it has BOM prefix or not. ... With the current version I can only open UTF8 files with BOM as UTF8. Without BOM I get the warning "Current SOURCE is UTF8, file loaded was ANSI" and the file is corrupted (Unicode characters displayed as two characters).
|
|
|
Post by George on Sept 3, 2020 11:19:10 GMT -5
BOM OFF only affects writing of the BOM. I'm not sure why SOURCE UTF8 is overridden by ANSI, it shouldn't. So as Robert said, sounds like a bug. I'll go have a look. George [UPDATE] OK, I have a test file that shows the problem. No need for Manfred to send a sample file. [\UPDATE] [UPDATE2] OK, UTF is corrected. Here's a test version. SPFLite22.exe (478.5 KB) [\UPDATE2]
|
|
|
Post by ManfredU on Sept 3, 2020 12:39:04 GMT -5
BOM OFF only affects writing of the BOM. I'm not sure why SOURCE UTF8 is overridden by ANSI, it shouldn't. So as Robert said, sounds like a bug. I'll go have a look. George [UPDATE] OK, I have a test file that shows the problem. No need for Manfred to send a sample file. [\UPDATE] [UPDATE2] OK, UTF is corrected. Here's a test version. [\UPDATE2] Thanks a lot for the fast response! I will test this tomorrow morning when I'm back at the office.
|
|
|
Post by ManfredU on Sept 3, 2020 12:45:03 GMT -5
... I don't see "corruption", ... You see this only for double byte characters. E.g. umlauts like öäü ÖÜÄ. In case the utf8 file is interpreted as ansi, those characters are displayed as two characters. If your test file does not contain any characters > 255 there is no difference between utf8 without bom and ansi. Manfred
|
|
|
Post by George on Sept 3, 2020 16:09:06 GMT -5
Robert: Manfred:
BOM does only affect the BOM writing.
And what I found was one of those "what on earth is THAT doing in there" type pieces of code. So whether a UTF8 file has the BOM or not should not matter.
I altered all my various UTF test files to make sure they had some double byte characters in them, and (for me at least) they all opened properly.
So hopefully, this should be corrected.
George
|
|
|
Post by George on Sept 4, 2020 10:02:07 GMT -5
Robert:
OK, BOM ON/OFF only affects writing UTF8 files, it has nothing to do with reading.
For reading UTF8 files:
If a BOM exists, it will force SOURCE to UTF8 regardless of it's Profile setting.
If a BOM doesn't exists, then the Profile must specify UTF8 or it will not read correctly.
Same for all the UTF16 variations.
A BOM will force SOURCE to whatever the BOM says.
No BOM will use whatever the Profile says.
So in all UTF cases the BOM is optional, if seen, the BOM wins. If no BOM, the Profile SOURCE wins.
When writing, all UTF16 forms will write the BOM, UTF8 writing will honor the BOM ON/OFF setting.
|
|
|
Post by George on Sept 4, 2020 11:10:50 GMT -5
Robert: The overriding of SOURCE to UTF based on the presence of BOM markers has been in SPFLite ever since the UTF reading support was added, many, many years ago, it is not some recent change.
And if the designers of the BOM values figured it was a reasonable approach, and would not likely collide with any normal files, that decision if fine by me.
Also, SPFLite is a TEXT editor, anyone loading weird binary files, that may just have accidentally have some kind of valid BOM indicator at the beginning, is on their own.
The overriding has been like that for years, without complaints. No reason to change it now. This latest bug was just that - a bug - it occurred when the file read routine was re-structured to handle the new XFORM support.
|
|
|
Post by ManfredU on Sept 8, 2020 8:46:06 GMT -5
[UPDATE2]
OK, UTF is corrected. Here's a test version.
SPFLite22.exe (478.5 KB)
[\UPDATE2]
Yes, this does open UTF8 Files without BOM properly. Thanks for fixing. Sorry for the delay, I was busy with a different project in the meantime.
Manfred
|
|
|
Post by George on Sept 8, 2020 12:46:53 GMT -5
No problem Manfred. Glad it works better.
George
|
|