Quantcast
Channel: SCN : Discussion List - PowerBuilder Developer Center
Viewing all articles
Browse latest Browse all 2881

FILTER vastly outperforming DELETE

$
0
0

I’m far from an experienced programmer and I feel remiss that I can’t objectively contribute to the discussions posted on the forum.  But recently I modified the application I develop and it dramatically speed up the process of ‘removing’ redundant rows from a datastore by using a filter rather than a delete, discard, or move rows to the delete buffer.  This process is not in any PB text or reference book I’ve encountered.  So I thought I’d explain it with the hope that someone else may benefit as I did. Moreover, experienced programmers may be able to value-add to my comments. 

By way of background, the application reads in data from a text or spreadsheet file.  Each row of data includes entities such as names and addresses, telephone numbers, times and dates etc.  The application parses and validate the data and orders it in sequential rows in a datawindow.

However, many of the rows carry replicated, or near-replicated, data.  Once the replicated data is consolidated into one row, the residual redundant rows, often up to 60% of the total rows, have to be ‘discarded’.

I found that copying the total rows to be processed to a datastore was quicker than using the datawindow and, in regard to removing the redundant rows, using ‘rowscopy’ to the DELETE! buffer was much quicker than DeleteRow. 

Processing time was acceptable when the source files were only several thousand rows.  But occasionally there’d be files with 70,000 rows or more. The hidden overheads of the ostensibly simple ‘rows copy to delete’ process, albeit one row at a time, resulted in an unacceptable three- to four-hours processing time.

I was aware the Filter process worked quickly when filtering large datastores.  With this in mind, I ‘flagged’ each redundant row by putting an arbitrary number in an invisible column that, to this point, colour-highlighted replicated row-sets for the user’s information in the datawindow: ie, a value of zero did not highlight, one highlighted green, two blue, three yellow etc.

As a row was made ‘redundant’ in the ‘crunching’ loop, instead of copying the row to the Delete buffer, I set the value of 500 in this colour-highlight column.  Once all the rows had processed through the merging loop, I set a filter to retain rows that did not equal this 500 value, and filtered the datastore.  The process then copied the datastore PRIMARY buffet to the datawindow for presentation.

What had previously taken over three hours went through in, literally, seconds.   


Viewing all articles
Browse latest Browse all 2881

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>