Datashift enhancement

Stefan
Administrator

Posts: 990

Datashift enhancement Jan 7, 2023 5:20:45 GMT -5

Quote

Post by Stefan on Jan 7, 2023 5:20:45 GMT -5

George,

There's always been an issue with "Datashift" - even back in the ISPF/PDF days.

It reduces (or increases) multiple blanks to the right of the changed string in a non-smart way.

90% of the time, this is exactly what the user wants.

Problem occur when there is a quoted string which include one or more consquetive blanks to the right of the changed string.

An innocent CHANGE ALL .... command can quickly mess up your code in this situation, especially as it changes occurences 'not within in view'.

Stepping through afterwards to check every changed entry doesn't help because the user probably won't notice that a " " as reduced to " ".

So you cannot reliably use CHANGE ALL without risk, unless you manually step through via RCHANGE.

What would it take for SPFLite to consider a quoted string as a "single object" on a line and not compress/expand multiple blanks within?

George
Administrator

Posts: 4,057

Datashift enhancement Jan 7, 2023 11:18:44 GMT -5

Quote

Post by George on Jan 7, 2023 11:18:44 GMT -5

Stefan: Not too hard actually, but it still leaves a very slight 'gap' where accidental changes might happen.

The quoted string (using any of the 3 quotes ' " and `) must be AFTER the found string, and COMPLETE. i.e. both starting and ending quotes. Obviously if the found string is inside a quoted string, all bets are off.

I'll try and get a Beta out so you can test it and see if it works the way you expect.

George

Stefan
Administrator

Posts: 990

Datashift enhancement Jan 8, 2023 5:24:15 GMT -5 via mobile

Quote

Post by Stefan on Jan 8, 2023 5:24:15 GMT -5

George,

Thank you, that was quick action.
I'll download v23007 and will report back asap.

I understand the limitations you describe.

Question...
Does your change apply to "DS logic" in general or the CHANGE command in particular?
For instance, does it also work with >>, << line commands and/or the (Datashift) insert primitive?

George
Administrator

Posts: 4,057

Datashift enhancement Jan 8, 2023 10:04:38 GMT -5

Quote

Post by George on Jan 8, 2023 10:04:38 GMT -5

Stefan: Right now just CHANGE, I'd like your comments before adding to the other ones.

George

[UPDATE]

Had a look, the > >> and < << line commands should be fine. Robert wrote that chunk of code, did his usual fine job.

The KB primitive I'll look after.

[/UPDATE]

Last Edit: Jan 8, 2023 11:11:54 GMT -5 by George

Robert Graduate Member Posts: 3,151	Datashift enhancement Jan 8, 2023 11:55:04 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Robert on Jan 8, 2023 11:55:04 GMT -5 Too bad you didn't ask me.

George Administrator Posts: 4,057	Datashift enhancement Jan 8, 2023 12:25:20 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by George on Jan 8, 2023 12:25:20 GMT -5 Sorry Robert, next time I'll ask you first. It was nice to see you'd already handled it. George

Robert Graduate Member Posts: 3,151	Datashift enhancement Jan 8, 2023 12:46:47 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Robert on Jan 8, 2023 12:46:47 GMT -5 I would have done this particular thing differently, but, oh well.

George Administrator Posts: 4,057	Datashift enhancement Jan 8, 2023 13:19:11 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by George on Jan 8, 2023 13:19:11 GMT -5 Robert: OK, why would you do it differently? Is there a problem with just skipping quoted strings? It was a simple fix (only 6 lines). I'm not aware of any other problems with those functions. George

Robert
Graduate Member

Posts: 3,151

Datashift enhancement Jan 8, 2023 13:40:06 GMT -5

Quote

Post by Robert on Jan 8, 2023 13:40:06 GMT -5

The problem comes from the definition of a string. The simple fix undoubtedly makes assumptions about what a string is. Those assumption might work sometimes but not always. For instance, C strings with escaped quotes, or PL/1 or modern COBOL allows doubled quotes as literal quotes. That could confuse the quote-skipping logic. There's also the issue of strings within comments and vice versa. Not impossible to overcome, but the process is vulnerable to bugs.

SPFLite already has a mechanism to recognize strings. It's called auto colorization. The more robust way to address this would be to use the colorization parser to detect spans of strings. That way, each language can define its own strings consistent with their own language rules.

To make that possible, you would have to create a new class of colorization. Instead of Red or Green, there would be a new color called "String", with its own highlight color-code. For laughs, you could call it code S. Then, any time you were scanning a line and potentially compressing a blank, if its color were S, then it doesn't get compressed.

Comments could also have blanks that you might want protected, so similar remarks apply to them. Perhaps you can see why having this action depend on the language, and thus the profile, is important. I would give comments a color code of C.

To enable these new color codes, two existing colors would have to be retired. S is not currently used, but C is (CYAN), so that would be an impact on existing users. You could still use C and S as real colors, but they would be reserved internally for space-protection purposes. That way, any subsequent scanning code would only have to look at the color attribute line to see if a blank should be protected or not.

To maintain backward compatibility, these actions should be conditional. A global option checkbox for each would do the job, like this:
[x] Protect blanks within strings from being compressed
[x] Protect blanks within comments from being compressed

I am sure this more than you wanted.

George
Administrator

Posts: 4,057

Datashift enhancement Jan 8, 2023 15:40:09 GMT -5

Quote

Post by George on Jan 8, 2023 15:40:09 GMT -5

You know me so well. Yes, considerably more. The color parser already handles keywords, delimiters, numeric strings, etc. all of which trigger a scheme value to be set. Now somehow this would require some overriding attribute called STRING, on top of those, which would encompass perhaps all or some of those, blanks and perhaps others. This is a significant change to the color parser as it now means looking for STRINGS which may encompass multiple lower level items already parsed out.

The next problem is the Attribute bits for a character are all used, there simply isn't any way to jam in a new STRING attribute. And the SCHEME assignment is a numeric value, not a bunch of bit flags, so there's no way to superimpose a new bit flag for STRING. We can't just reduce the number of Schemes, and re-use the freed values for STRING.

I've revised the handling of the Attribute field once before to accomodate changes, it's not something you'd wish on anybody.

Also, your solution presupposes having HILITE AUTO ON. What about normal text files, table files etc.? Do we have to maintain two sets of logic, one for real Source files with AUTO support, and another for non-AUTO support? Even ISPF didn't handle that.

It's a wonderful solution, that solves a lot of problems, but it's a significant change to solve what is not a significant problem.

George

Robert
Graduate Member

Posts: 3,151

Datashift enhancement Jan 8, 2023 17:07:16 GMT -5

Quote

Post by Robert on Jan 8, 2023 17:07:16 GMT -5

The idea is that the attributes would not dramatically change. When you colorize a line, it assigns a color attribute to each of the symbols you deem important. This is not superimposing anything. You describe the syntax of a string, and any color that is associated with is always going to use color code S, which will have some fixed scheme number. Suppose we steal schemes 13 for String and 14 for Comment. There won't be any "jamming", but there would be "stealing" so to speak. We would have to live with 12 general purpose colors, and two colors dedicated for this special purpose. As I see it, there would be *no* new attribute bits. You would use the same ones in the same way. It's just that some of those scheme numbers would have an implied special meaning.

You said, "We can't just reduce the number of Schemes, and re-use the freed values for STRING" but I believe that is exactly what you would do. Perhaps you could explain why that would not be possible. The String property would be identically the same as a color; it would be an "overriding attribute" but a simple color code. The only part that makes it different is when you would doing Change and Shift commands, and then you'd look at the color codes for "guidance" to help you decide how to handle spans of spaces.

You mentioned, "What about normal text files, table files etc.?" Well, what about them? Do you want SPFLite to "guess" what is "string" is in raw data - data that the user *never* described in terms of having string tokens (much less, tokens of any kind)?

My feeling is that if users fail to define any syntax or structure whatsoever for their raw data, they have no reasonable expectation that SPFLite should treat their spaces of spaces in some magical way. Why would it? No other editor does that, that I know of. Even Notepad++ won't do that unless you declare a language type and associate it with a file extension.

Yes. ISPF didn't do all this, but then there's lots that ISPF didn't do.

---

There is another way to do this that is less elegant but simpler.

For any given file type, there would some kind of process that would scan a file for spans of blanks. Whatever needs to be protected from being compressed would be turned into Non Breaking Space. You might have to adjust some of your find/change logic to make Space and Non-Breaking Space be treated as equivalent. Whenever the file is saved, the NBS chars would be converted back to real spaces.

It is common in many editors to get around editing limitations by doing temporary conversions between SP and NBSP.

Not a perfect solution, but it is a solution. And, it leaves open the possibility of protecting different kinds of files in different ways.

(The NBSP is just an idea. Any alternative code would be OK as long as it wasn't present in user data, like X'01' or something.)

---

Just so know that all of the above is not just pie-in-the-sky, take a look at the ANSI chart for > X'80'. There are 'curved' quotes like ‘ ’ ‚ “ ” „ as well as what are called "guillemet" quotes like ‹ › « » used for things like French. It is quite likely that a user typing in French might want THEIR spaces protected too. Are you really going to hard-code tests for ALL of these quoted string types? Or is this concept only intended for legacy programming source-code text? Merely checking for ASCII quotation and apostrophe isn't all there is too it. There are also exotic string constants, such as in modern releases of C++, that are truly weird. If your processing isn't based on the profile and colorization tokenizing, such users will have no way to define and protect their space data in the way you are making available to plain-ASCII users.

---

Is any of this trivial? No. It might help, or it might not.

Or, go with your 4-line code change. That would be my vote. Don't waste your remaining days listening to me. You've already wasted too much time doing that.

---

Whatever you do, it's important to allow users to disable this new feature if they don't want spans of spaces treated specially. Don't break backward compatibility. Anything else is up to you.

Last Edit: Jan 8, 2023 21:49:23 GMT -5 by Robert

George
Administrator

Posts: 4,057

Datashift enhancement Jan 9, 2023 10:23:22 GMT -5

Quote

Post by George on Jan 9, 2023 10:23:22 GMT -5

Robert: Users can already 'turn off' this feature - CHANGE CS / DS.

Stealing an existing SCHEME to assign to STRING basically takes away most of colorization support. All keywords, numeric strings, punctuation etc. are all part of STRING, so their individual colorization is taken away from them. I doubt that's your intention. That's what I meant when I said it has to be a new attribute, in addition to the existing, attributes.

QUOTES - users can already specify stuff like ‘ ’ ‚ “ ” „ , ‹ ›, and « ». Type HELP AUTO and read QUOTED.

Non-Breaking Spaces are ok, but another non-starter. I'd have to create my own version of most of the string functions that are used, from simple compares, to INSTR, VERIFY, STRDELETE and on and on. Let alone how to do the 1st part, determining what spaces SHOULD be protected.

To do all this 'properly' means SPFLite would need to be able to accurately and properly parse almost every source type, along with all the varieties of escape characters, double-quoting and any other peculiarities of the language. Never going to happen till someone else takes over SPFLite support.

Remember, this thread started as a bug report in the DS support. All we were trying to do is to duplicate what good old ISPF used to do, not some major code enhancement. So I feel adding a few lines of code to correct the problem is appropriate.

George

Robert Graduate Member Posts: 3,151	Datashift enhancement Jan 9, 2023 11:20:59 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Robert on Jan 9, 2023 11:20:59 GMT -5 The code fix should be sufficient, I have no further objections.

Robert
Graduate Member

Posts: 3,151

Datashift enhancement Jan 9, 2023 15:34:38 GMT -5

Quote

Post by Robert on Jan 9, 2023 15:34:38 GMT -5

BTW, not that it matters, but you are kind of arguing my own point for me by mentioning about HELP AUTO and QUOTED. My original post was about using auto coloring to parse user data to accurately determine quoted strings, so they could be found easier for the data shift issue. You rejected that idea, so then why go back and mention AUTO and QUOTED when they are part of the very suggestion I made that you didn't want to do? That doesn't really make sense.

But again, this is a moot point. Just saying.

George
Administrator

Posts: 4,057

Datashift enhancement Jan 9, 2023 16:35:08 GMT -5

Quote

Post by George on Jan 9, 2023 16:35:08 GMT -5

I mentioned it because you brought up the issue of other potential quote characters, like ‘ ’ ‚ “ ” „ , ‹ ›, and « ». And I wanted to remind you that the issue is already covered off. You mentioned the 'other' quote characters as if they were a problem for us, I assumed you'd forgotten that we already handled it.

George

Datashift enhancement

Post by Stefan on Jan 7, 2023 5:20:45 GMT -5

Post by George on Jan 7, 2023 11:18:44 GMT -5

Post by Stefan on Jan 8, 2023 5:24:15 GMT -5

Post by George on Jan 8, 2023 10:04:38 GMT -5

Post by Robert on Jan 8, 2023 11:55:04 GMT -5

Post by George on Jan 8, 2023 12:25:20 GMT -5

Post by Robert on Jan 8, 2023 12:46:47 GMT -5

Post by George on Jan 8, 2023 13:19:11 GMT -5

Post by Robert on Jan 8, 2023 13:40:06 GMT -5

Post by George on Jan 8, 2023 15:40:09 GMT -5

Post by Robert on Jan 8, 2023 17:07:16 GMT -5

Post by George on Jan 9, 2023 10:23:22 GMT -5

Post by Robert on Jan 9, 2023 11:20:59 GMT -5

Post by Robert on Jan 9, 2023 15:34:38 GMT -5

Post by George on Jan 9, 2023 16:35:08 GMT -5