I l@ve RuBoard Previous Section Next Section

3.6 Combining Strings

Credit: Luther Blissett

3.6.1 Problem

You have several small strings that you need to combine into one larger string.

3.6.2 Solution

The + operator concatenates strings and therefore offers seemingly obvious solutions for putting small strings together into a larger one. For example, when you have all the pieces at once, in a few variables:

largeString = small1 + small2 + ' something ' + small3 + ' yet more'

Or when you have a sequence of small string pieces:

largeString = ''
for piece in pieces:
    largeString += piece

Or, equivalently, but a bit more compactly:

import operator
largeString = reduce(operator.add, pieces, '')

However, none of these solutions is generally optimal. To put together pieces stored in a few variables, the string-formatting operator % is often best:

largeString = '%s%s something %s yet more' % (small1, small2, small3)

To join a sequence of small strings into one large string, the string operator join is invariably best:

largeString = ''.join(pieces)

3.6.3 Discussion

In Python, string objects are immutable. Therefore, any operation on a string, including string concatenation, produces a new string object, rather than modifying an existing one. Concatenating N strings thus involves building and then immediately throwing away each of N-1 intermediate results. Performance is therefore quite a bit better for operations that build no intermediate results, but rather produce the desired end result at once. The string-formatting operator % is one such operation, particularly suitable when you have a few pieces (for example, each bound to a different variable) that you want to put together, perhaps with some constant text in addition. In addition to performance, which is never a major issue for this kind of task, the % operator has several potential advantages when compared to an expression that uses multiple + operations on strings, including readability, once you get used to it. Also, you don't have to call str on pieces that aren't already strings (e.g., numbers) because the format specifier %s does so implicitly. Another advantage is that you can use format specifiers other than %s, so that, for example, you can control how many significant digits the string form of a floating-point number should display.

When you have many small string pieces in a sequence, performance can become a truly important issue. The time needed for a loop using + or += (or a fancier but equivalent approach using the built-in function reduce) tends to grow with the square of the number of characters you are accumulating, since the time to allocate and fill a large string is roughly proportional to the length of that string. Fortunately, Python offers an excellent alternative. The join method of a string object s takes as its only argument a sequence of strings and produces a string result obtained by joining all items in the sequence, with a copy of s separating each item from its neighbors. For example, ''.join(pieces) concatenates all the items of pieces in a single gulp, without interposing anything between them. It's the fastest, neatest, and most elegant and readable way to put a large string together.

Even when your pieces come in sequentially from input or computation, and are not already available as a sequence, you should use a list to hold the pieces. You can prepare that list with a list comprehension or by calling the append or extend methods. At the end, when the list of pieces is complete, you can build the string you want, typically with ''.join(pieces). Of all the handy tips and tricks I could give you about Python strings, I would call this one the most significant.

3.6.4 See Also

The Library Reference sections on string methods, string-formatting operations, and the operator module.

    I l@ve RuBoard Previous Section Next Section