prino
New Member
Posts: 3
|
Post by prino on Jun 3, 2020 1:54:07 GMT -5
I'm doing traces of Pascal programs, printing every executed procedure and the resulting files are big, currently about 60Mb. My editor (RE from www.re.ravitz.us/) can handle them, but sadly it occasionally crashes with null-pointer assignments. Its best feature compared to e.g. notepad++ (an editor that's marginally OK, but with about the most awful useless on-line manual you can imagine), which I also use on occasion, is the fact that it allows ISPF like changes to specified columns by marking them. I suspect/expect that SPFLite will do this natively, but the question is, will it handle files of this size, and even bigger, like up to 120 Mb? Why? I'm currently in the process of writing another REXX exec to merge the calling tree with data extracted from the sources about the x86 registers used - most of the code in my programs has been converted to in-line assembler, as the Pascal compiler I'm using, Virtual Pascal, generates very poorly optimized code (as for that matter does FreePascal...), and if I would like to add registers used/saved up to all 9 levels of subroutines that the main program uses, even in a short (-*-+-- ebx/esi/edi/eax/ecx/edx, "-": not used, "*": saves (and used, and ("+": used but never saved (eax/ecx/edx)) format, and want to align these, I'm looking at extending each line to at least 44 bytes, upping the file size to round 120 Mb. Sure, I could upload this to my z/OS system, and use ISPF proper, but IND$FILE is kind of slow, and that's all I have for now. So if anyone has any experience with such huge files, feel free to share your experience!
|
|
|
Post by George on Jun 3, 2020 11:37:39 GMT -5
Prino: Robert: I just dragged out my largest test file, its about 180 meg, with about 160,000 lines.
It takes ~ 5-10 seconds to load.
I did a FIND ALL "xxx" (a string that occurs throughout the file) and the FIND took about 2 seconds.
So yes, I believe SPFLite will handle your file. Just be prepared for some commands to perhaps take a few seconds.
For reference, my system is a 4 core i5 running at 3.0 Ghz. 8Gb memory.
George
|
|
|
Post by mueh on Jun 3, 2020 13:20:47 GMT -5
George: My experience is that the MB are not relevant . I have a file with 45Mb and 1.25 Million lines and it takes 8Minutes to load . I aggree with you time for 160000 lines . One Processor is completly used of my I3 3.6Ghz . Tested now with XFORM macro and Add_Line is also slow with large nr of lines since it uses also the METHOD LInsertEmpty(ln AS LONG, n AS LONG, IFlag AS LONG) Are you shure that following Code is not the bottleneck . '----- Expand L() if running out of room
IF n + 5 > LCtr THEN ' If running out of room in L() make it bigger
ol = UBOUND(L()): nl = ol + n + (ol * 2) ' Save old array size, calc new size
REDIM PRESERVE L(nl) AS INSTANCE DataLine ' Get room for another bunch of lines in L() array
Thanks
|
|
|
Post by George on Jun 3, 2020 14:19:21 GMT -5
mueh: Yes, the routine is possibly the culprit, but (from memory) it starts initially with room for 10,000 lines, and each time it runs out it reallocates as Old-UBound + #-requested + (2 x Old-Ubount) so it goes from 10,000 to 30,000 to 90,000 to 270,000 etc. So it never does that re-allocation very often, even for a very large file.
Hmmm, just went to verify what I said. The above is correct, but the reallocation of the two text arrays, does not use the same algorithm, so would be invoked more often. I will try adjusting it and see whether it makes a difference.
It's nice that PB handles all the string allocation stuff, but we still have to be aware of what it does since it really shows up with large files.
George
[UPDATE]
Altered the algorithm, barely any noticeable difference. We must look elsewhere.
G.
[\UPDATE]
|
|
prino
New Member
Posts: 3
|
Post by prino on Jun 4, 2020 4:33:13 GMT -5
OK, I'll install it, and give it a go, but given the loading time of 8 minutes for a 45Mb file with 1.25 million lines, I am afraid, very afraid!
My file isn't much bigger at the moment, at 65Mb, but it contains 2.7 million lines, which would suggest a loading time of 15 or so minutes - I'm running W7-64 Pro on 2.5GHz Intel i7-4710MQ with 24Gb (and no paging file). The size of the file, assuming the Pascal code remains unchanged (highly unlikely, as I'm playing around with it very frequently) tends to increase by around 150,000 lines per year, assuming those small spiky balls of DNA doing the rounds right know stop interfering.
FWIW, RE (written in FreePascal) loads the file in about 6 seconds, but is not reliable enough, and NP++ loads it nearly instantaneously, but its lack of editing specific columns makes it useless for what I want to do.
|
|
|
Post by mueh on Jun 4, 2020 7:16:46 GMT -5
George: prino: Doubled my file and it takes 30 Min to Load with following xform Read Routine . j and i are not defined here . The below screen shot shows the nr of lines followed by second's since start of run . 1840 sec are 30Min and 40 seconds . Change Command and UNDO worked in the loaded file . I used xform since i can enter the time measeurement in the macro also tBasic file read causes additionol delay of 3 sec/100000 lines . George: What do you think about a test by increasing the REDIM by 2 Million lines . Redim of large aray may be slow and measure it again . Maybe the Code could get improved . Thanks '----- READ routine
function ReadFile()
local recs as long = 0
fnum = FILE_OPEN(fname, "INPUT")
i = GetTickCount
SPF_Debug(Time$)
for recs = 1 to 2600000
RC = Add_Line(0,FILE_LineInput(fnum)) ' Pass to SPFLite
' RC = SPF_XFORM_Put(File_LineInput(fnum)) ' Pass to SPFLite
' chunk = FILE_LineInput(fnum)
if File_EOF(fnum) then exit for
if recs\100000*100000 = recs then
j = (GetTickCount-i)/1000
SPF_Debug(recs+"|"+j)
endif
next '
j = (GetTickCount-i)/1000
SPF_Debug(recs+"|"+j)
SPF_Debug(Time$)
end function
|
|
|
Post by George on Jun 4, 2020 9:49:35 GMT -5
Prino: Mueh: I'll be looking in to this. I know in the past I've always stuck with the mantra that this is a Source Text editor and for any 'normal' files, it's not a problem.
It seems now, that it bears a much closer look if such large files are becoming more common.
George
|
|
|
Post by George on Jun 4, 2020 11:00:41 GMT -5
Robert: Nothing particular can be done with Add_Lines, it's just a macro wrapper for existing functions. It's THOSE functions that have to be looked at.
George
|
|
|
Post by mueh on Jun 5, 2020 0:44:24 GMT -5
George: I hope we will end up with a new Function
Add_File where we can pass as Parameter the Array of Records built by ErosOlmi's famous tbasic Statement
PARSE(FILE_Load(fname), MyMatrix(), $crlf ) in one call instead of doing Add_Line for every record in XFORM macro .
Thanks
|
|
|
Post by George on Jun 5, 2020 9:03:17 GMT -5
mueh: So you want ME to code the FOR/NEXT loop so that YOU don't have to? I wrote it on my 'someday' list. George
|
|
|
Post by George on Jun 5, 2020 11:30:42 GMT -5
Robert: How does that help? MUEH was trying to avoid coding a loop. To read a file into a Stringbuilder means looping doing file reads and StringBuilder.ADD functions. He can already read an entire file into an array in 1 statement (that's just a wrapper for the original PB function that does the same thing).
With either solution, it still means I have to loop myself to extract the lines from the array, or from the file, or from the Stringbuilder object (if we can even pass that in a macro function.
Both of us are lazy and looking for the easiest way out.
George
|
|
|
Post by mueh on Jun 5, 2020 13:22:45 GMT -5
George: Sorry for the missunderstanding .
I meant that with passing the Array you would know the nr of lines which are needed for the size of your Arrays and you could build your Arrays only once .
I you realy want to improve the Performance ( SPF/SE loads the file in seconds) you must avoid rebuilding the arrays on every line .
I know that might be a lot of work so it's only a Suggestion .
Thanks
|
|
|
Post by George on Jun 5, 2020 14:45:42 GMT -5
mueh: I realize the way insertions are handled is bad, but for the majority of line insertions, there is no way to know how many extra 'slots' are needed. I'm looking at creating a parallel routine to the existing file loading routine.
The problem is in calculating how many slots a file load will need. Right now, because the file reader handles all kinds of files, some are impossible to obtain a number. (Like mainframe VB files, or those custom VBI type.) Or if the user specifies EOL=AUTONL, where some lines are inserted, or overlaid etc.
Because the insertion slots are the bottleneck, I may even try the route of pre-parsing the file just to get the count, then bulk adding the slots and re-parsing to do the real additions.
Obviously these solutions are not trivial to 'try out'. Is multiple 'passes' through the data going to cost more than the savings in expanding the array? My bet is yes, but ....
I'll get this beaten up, but it's not a quick nor easy fix. We have warned users for years about the performance hit of large files, when we started doing the warnings, large files were those bigger than 50,000 to 75,000 lines. Nowadays, you guys are tossing around mega-line files. It's a whole new world.
I'll just have to stop responding to the "it would be nice if ..." type requests for a while till these more important requirements are resolved.
George
|
|
|
Post by mueh on Jun 5, 2020 15:44:55 GMT -5
George: i would first implement it with following XFORM Read Routine to load a text file . nLines is the nr of lines in the Array . Add_File should use your new Routine instead of LInsertEmpty Method used by Add_Line
'----- READ routine
function ReadFile()
dim MyMatrix() as string
dim nLines as long
dim cx as long
'---Just one line do the job of loading file data, parsing text lines, dimensioning and filling the matrix.
'------
PARSE(FILE_Load(fname), MyMatrix(), $crlf )
'--Now get the number of lines and max number of columns parsed
nLines = ubound(MyMatrix(1))
for cx = 1 to nLines
RC = Add_Line(0,MyMatrix(cx)) ' Pass to SPFLite
next
end function
I'm getting a fan of XFORM macro Feature Thanks .
|
|
|
Post by George on Jun 5, 2020 15:58:42 GMT -5
MUEH: Getting records for normal files is brain dead simple, just use FILESCAN and you know the answer. But that only works for plain, ordinary text files, and the CopyFile routine is tasked with being able to handle ANY file type.
It's never as simple as you'd think.
Robert: Approximation is not a lot of help, I'll look at it, but then it gets into handling a) I didn't allocate enough or b) I allocated too many. One means doing single allocations in the middle of a loop, the other means going back and freeing unused items. Both are messy. Would it still be better? It's a lot of coding just to being able to benchmark and find out.
As I said, I'll get there.
George
|
|