1527 ObjectPAL: Efficient String Concatenation
VERSIONS: All

SUBJECT: Efficient String Concatenation

BY: gfk < gfk05ATgmx.net >
Date: 3 February 2001

This article deals about the rather special case that large strings have to be constructed with the means of OPAL. Now you are warned that this might become a little bit boring.

1. Task

A string shall be computed by concatanation of a large number (let's say: 1000) of very short strings. For simplicity, we assume that these short strings consist of a single character.

For this discussion, I call the length of the result N. The operation which computes the i-th character is called f(i). (In practice, this may be any string expression.)

Practical example: Convert a string from one character set to another. f(i) computes the converted character at the i-th position.


2. Simple approach
r = ""
for i from 1 to N
    r = r + f(i)
endFor
return r
Cost of this algorithm:
Memory management: O(N) temporary result strings have to be allocated.
Copying: O(N*N) (roughly N*N/2) times a character is copied.


3. Advanced approach
r = ""
s = ""
for i from 1 to N
    s = s + f(i)
    if s.size()>r.size() then
       r = r + s
       s = ""
    endIf
endFor
return r + s
Cost of this algorithm:
I have failed to find a formula (maybe someone can who is a better at math than me), but I assume the following for all N>k, whereby k is some positive constant:
Memory managment: MM(N) < O(N)
Copying: O(N*ld(N)) <= CP(N) < O(N*N)

With other words: For a sufficently large N, the cost of the additional logic is lower than saved cost for MM and copying.


4. Benchmark results

For comparing the two algorithms, I have used a single-character string (".") expression for f(i). Thus the computed string simply consists of N dots. There are overheads and a measurement inaccuracy involved, but these are independent of N and the same with both variants of the algorithm.
NT1T2
25006020
500026060
100001150210
200005390780

T1 is the time spent inside the simple algorithm, while T2 is the time spent with the advanced approach. Both times are in milliseconds. As one can see quite easily, the advanced algorithm is significantly faster, and its cost is growing much slower.


5. Final remarks

f(i) may as well return strings larger than 1 character, which was the case with my application (converting a string into HTML, where some characters have representations like """).

The proposed algorithm increases the complexity of the code. It helps to encapsulate the logic, for example in a 'method put(const x String)' and a 'method get() String'.
and a 'method get() String'.


6. Disclaimer

The basic ideas of the proposed algorithm are well-known. I do *not* claim having invented it. Thus everybody is free to use or modify it.


VERSIONS: All

SUBJECT: String concatenation

BY: VLADIMIR MENKIN
Date: 14 June 2001

I found an interesting effect, which have a practical value in some cases (I found it when investigated why my program generates some of HTML pages with tables dangerously slow). When you add (concatenate) a short string to the resulting string many times, this operation becomes more slow from step to step.

Here's a simple illustration. Create a form with two pushbuttons and assign to them the following code:

Button 1:
method pushButton(var eventInfo Event)
var
 st1,st2 string
 i,j longint
 dt1,dt2 datetime
endvar
st1=""
st2="1234567890"
dt1=datetime()
for i from 1 to 100
 for j from 1 to 200
  st1=st1+st2
 endfor
endfor
dt2=datetime()
view((number(dt2)-number(dt1))/1000)
endMethod

Button 2:
method pushButton(var eventInfo Event)
var
 st1,st2,st0 string
 i,j longint
 dt1,dt2 datetime
endvar
st1=""
st2="1234567890"
dt1=datetime()
for i from 1 to 100
  st0=""
  for j from 1 to 200
    st0=st0+st2
  endfor
  st1=st1+st0
endfor
dt2=datetime()
view((number(dt2)-number(dt1))/1000)
endMethod
The difference between these buttons is a temporary variable used for the inner cycle in Button2. The code for both buttons generates the same result string, but...
For Button1 the code executes in 80 sec.
For Button2 the code executes in 0.8 sec. (!)
P800/384MB RAM Paradox 9/SP3/Win98

The dependence of execution time from the cycle count limit (effectively, the resulting string size) for Button1 is absolutely non-linear - it's closer to exponential.

I guess that this issue is related to the memory allocation/reallocation algorithm used by Paradox (Windows?).


Comments from: Liz < lizATparadoxcommunity.com >
Really interesting! I'm going to review all my code now!

Button1: 71.15
Button2: 0.60

P3 450MHz, 128 MB RAM
Paradox 8 SP1
Win98 SE


Comments from: DAN ALDER (Corel)
Bizarre. 102.47 vs 1.04 on my P600/256 MB running Win2000.


To index