[Python Hack]: How to append characters to a string in the most efficient way?

2 minute read

Hello!

Instinctively, we tend to use str += char. It's worked and explicit. But sometimes it gives Time Limit Exceed(TLE) when doing algorithm-coding. I'm gonna explain why, so take a seat and fasten the seat-belt.
First, take a brief look at problem here if you want. and here is my all submitted solutions.
These idea are exactly identical, yet different from implementation. One is using str+=char and the other is using an array then join at the end. Taking a glance at `diff` comment section.
 1class Solution(object):
 2    def decodeCiphertext(self, s, rows):
 3        """
 4        :type encodedText: str
 5        :type rows: int
 6        :rtype: str
 7        """
 8        cols=len(s)/rows
 9        res="" # diff
10        for i in range(cols-rows+2):
11            while i<len(s):
12                res+=s[i] # diff
13                i+=(cols+1)
14        return res.rstrip() # diff
15     
 1class Solution(object):
 2    def decodeCiphertext(self, s, rows):
 3        """
 4        :type encodedText: str
 5        :type rows: int
 6        :rtype: str
 7        """
 8        cols=len(s)/rows
 9        res=[] # diff
10        for i in range(cols-rows+2):
11            while i<len(s):
12                res.append(s[i]) # diff
13                i+=(cols+1)
14        return ''.join(res).rstrip() # diff
15      

Result

The left one returns Time limit exceeded.
The right one, It worked like a charm..

What's going on?

Both are doing the same thing, but the different key is how it deals with extending the memory when exceeding the current one.

For += operator:


Since string is an immutable object whose state cannot be modified after it is created. So whenever appending a char to it, It is planned to create a new string with the length advanced by 1, then copy old data to new array one by one, finally adding new char at the end. Hence, with each of adding operation, the cost is always O(N).

For array:

When exceeding the current memory, it generates new array with the length multiplied by 2. Then moving old data to the new house and add the new char at the end. But by multiplying the length by 2, for next adding time, it only costs O(1). In amortized time complexity term, it's O(1). Lets prove it by simple math.
Occupied/Length Adding Creating new array Cost
0/1 1 1
1/1 1 x 1
1/2 1 1
2/2 1 x 2
2/4 1 1
3/4 1 1
4/4 1 x 4
4/8 1 1
4 times...
8/8 1 x 8
8 times...

Note: when creating new array, the most expensive cost is moving old data to new array.

Given $n$ adding operations, the formula for total operations will be:

$$\begin{aligned} \sum_{i=1}^n &= n + (1+2+4+...+k) \: with \; k \le n \le 2*k \; \wedge \; min{(2*k-n)} \\ &= n+2*k-1\\ &\le 3n-1 = O(N) \end{aligned}$$ Note: $$\begin{align} proof: 2k-1 &= 1+2+4+...+k\\ assign: s&=1+2+4+...+k\\ s&=1+2*(1+2+..+\frac{k}{2})\\ s&=1+2*(s-k)\\ s&=2k-1 \\ \end{align}$$
The averaged value for one adding will be: $$\begin{align} \frac{\sum_{i=1}^n}{n} = \frac{O(N)}{n}=O(1) . \end{align}$$

Now, it's obvious that the performance of using array is more than efficient than += operator.

Help me water it