This page is devoted to various tips and tricks that help improve the performance of your Python programs. Wherever the information comes from someone else, I've tried to identify the source. Note: I originally wrote this in (I think) 1996 and haven't done a lot to keep it updated since then. Python has has changed in some significant ways since then, which means that some of the orderings will have changed. You should always test these tips with your application and the version of Python you intend to use and not just blindly accept that one method is faster than another.
Also new since this was originally written are packages like Pyrex, Psyco, Weave, and PyInline, which can dramatically improve your application's performance by making it easier to push performance-critical code into C or machine language.
If you have any light to shed on this subject, let me know.
map
with Dictionaries The first step to speeding up your program is learning where the
bottlenecks lie. It hardly makes sense to optimize code that is never
executed or that already runs fast. I use two modules to help locate the
hotspots in my code, profile and trace. In later examples I also use the
timeit
module, which is new in Python 2.3.
The profile
module is included as a standard module in the Python distribution.
Using it to profile the execution of a set of functions is quite easy.
Suppose your main function is called main
, takes no arguments
and you want to execute it under the control of the profile module. In its
simplest form you just execute
import profile profile.run('main()')
When main()
returns, the profile module will print a table
of function calls and execution times. The output can be tweaked using the
Stats class included with the module. In Python 2.4 profile will allow the
time consumed by Python builtins and functions in extension modulesto be
profiled as well.
New in Python 2.2, the hotshot
package is intended as a replacement for the profile module. The
underlying module is written in C, so using hotshot should result in a much
smaller performance hit, and thus a more accurate idea of how your
application is performing. There is also a hotshotmain.py
program in the distributions Tools/scripts
directory which
makes it easy to run your program under hotshot control from the command
line.
The trace module is a spin-off of the profile module I wrote originally to perform some crude statement level test coverage. It's been heavily modified by several other people since I released my initial crude effort. As of Python 2.0 you should find trace.py in the Tools/scripts directory of the Python distribution. Starting with Python 2.3 it's in the standard library (the Lib directory). You can copy it to your local bin directory and set the execute permission, then execute it directly. It's easy to run from the command line to trace execution of whole scripts:
% trace.py -t spam.py eggs
There's no separate documentation, but you can execute "pydoc trace" to view the inline documentation.
From Guido van Rossum <guido@python.org>
Sorting lists of basic Python objects is generally pretty efficient. The sort method for lists takes an optional comparison function as an argument that can be used to change the sorting behavior. This is quite convenient, though it can really slow down your sorts.
An alternative way to speed up sorts is to construct a list of tuples whose first element is a sort key that will sort properly using the default comparison, and whose second element is the original list element. This is the so-called Schwartzian Transform.
Suppose, for example, you have a list of tuples that you want to sort by the n-th field of each tuple. The following function will do that.
def sortby(somelist, n): nlist = [(x[n], x) for x in somelist] nlist.sort() return [val for (key, val) in nlist]
Matching the behavior of the current list sort method (sorting in place) is easily achieved as well:
def sortby_inplace(somelist, n): somelist[:] = [(x[n], x) for x in somelist] somelist.sort() somelist[:] = [val for (key, val) in somelist] return
Here's an example use:
>>> somelist = [(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')] >>> somelist.sort() >>> somelist [(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')] >>> nlist = sortby(somelist, 2) >>> sortby_inplace(somelist, 2) >>> nlist == somelist True >>> nlist = sortby(somelist, 1) >>> sortby_inplace(somelist, 1) >>> nlist == somelist True
Strings in Python are immutable. This fact frequently sneaks up and bites novice Python programmers on the rump. Immutability confers some advantages and disadvantages. In the plus column, strings can be used a keys in dictionaries and individual copies can be shared among multiple variable bindings. (Python automatically shares one- and two-character strings.) In the minus column, you can't say something like, "change all the 'a's to 'b's" in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies in Python programs.
Avoid this:
s = "" for substring in list: s += substring
Use s = "".join(list)
instead. The former is a very common
and catastrophic mistake when building large strings. Similarly, if you are
generating bits of a string sequentially instead of:
s = "" for x list: s += some_function(x)
use
slist = [some_function(elt) for elt in somelist] s = "".join(slist)
Avoid:
out = "<html>" + head + prologue + query + tail + "</html>"
Instead, use
out = "<html>%s%s%s%s</html>" % (head, prologue, query, tail)
Even better, for readability (this has nothing to do with efficiency other than yours as a programmer), use dictionary substitution:
out = "<html>%(head)s%(prologue)s%(query)s%(tail)s</html>" % locals()
This last two are going to be much faster, especially when piled up over many CGI script executions, and easier to modify to boot. In addition, the slow way of doing things got slower in Python 2.0 with the addition of rich comparisons to the language. It now takes the Python virtual machine a lot longer to figure out how to concatenate two strings. (Don't forget that Python does all method lookup at runtime.)
Python supports a couple of looping constructs. The for
statement is most commonly used. It loops over the elements of a sequence,
assigning each to the loop variable. If the body of your loop is simple,
the interpreter overhead of the for
loop itself can be a
substantial amount of the overhead. This is where the map
function is handy. You can think of map
as a for
moved into C code. The only restriction is that the "loop body" of
map
must be a function call.
Here's a straightforward example. Instead of looping over a list of words and converting them to upper case:
newlist = [] for word in oldlist: newlist.append(word.upper())
you can use map
to push the loop from the interpreter into
compiled C code:
newlist = map(str.upper, oldlist)
List comprehensions were added to Python in version 2.0 as well. They provide a syntactically more compact way of writing the above for loop:
newlist = [s.upper() for s in list]
It's generally not any faster than the for loop version, however.
Guido van Rossum wrote a much more detailed examination of loop optimization that is definitely worth reading.
Suppose you can't use map
or a list comprehension? You may
be stuck with the for loop. The for loop example has another inefficiency.
Both newlist.append
and word.upper
are function
references that are reevaluated each time through the loop. The original
loop can be replaced with:
upper = str.upper newlist = [] append = newlist.append for word in list: append(upper(word))
This technique should be used with caution. It gets more difficult to
maintain if the loop is large. Unless you are intimately familiar with that
piece of code you will find yourself scanning up to check the definitions of
append
and upper
.
The final speedup available to us for the non-map
version
of the for
loop is to use local variables wherever possible.
If the above loop is cast as a function, append and
upper
become local variables. Python accesses local variables
much more efficiently than global variables.
def func(): upper = str.upper newlist = [] append = newlist.append for word in words: append(upper(word)) return newlist
At the time I originally wrote this I was using a 100MHz Pentium running
BSDI. I got the following times for converting the list of words in
/usr/share/dict/words
(38,470 words at that time) to upper
case:
Version | Time (seconds) |
---|---|
Basic loop | 3.47 |
Eliminate dots | 2.45 |
Local variable & no dots | 1.79 |
Using map function | 0.54 |
Eliminating the loop overhead by using map
is often going
to be the most efficient option. When the complexity of your loop precludes
its use other techniques are available to speed up your loops, however.
Suppose you are building a dictionary of word frequencies and you've already broken your text up into a list of words. You might execute something like:
wdict = {} has_key = wdict.has_key for word in words: if not has_key(word): wdict[word] = 0 wdict[word] = wdict[word] + 1
Except for the first time, each time a word is seen the if
statement's test fails. If you are counting a large number of words, many
will probably occur multiple times. In a situation where the initialization
of a value is only going to occur once and the augmentation of that value
will occur many times it is cheaper to use a try
statement:
wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1
It's important to catch the expected KeyError exception, and not have a
default except
clause to avoid trying to recover from an
exception you really can't handle by the statement(s) in the
try
clause.
A third alternative became available with the release of Python 2.x. Dictionaries now have a get() method which will return a default value if the desired key isn't found in the dictionary. This simplifies the loop:
wdict = {} for word in words: wdict[word] = wdict.get(word, 0) + 1
When I originally wrote this section, there were clear situations where one of the first two approaches was faster. It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.
import
statements can be executed just about anywhere.
It's often useful to place them inside functions to restrict their
visibility and/or reduce initial startup time. Although Python's
interpreter is optimized to not import the same module multiple times,
repeatedly executing an import statement can seriously affect performance in
some circumstances.
Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python/python-list@python.org posting and later attributed to him in another source):
def doit1(): import string ###### import statement inside function string.lower('Python') for num in range(100000): doit1()
or:
import string ###### import statement outside function def doit2(): string.lower('Python') for num in range(100000): doit2()
doit2
will run much faster than doit1
, even
though the reference to the string module is global in doit2
.
Here's a Python interpreter session run using Python 2.3 and the new
timeit
module, which shows how much faster the second is than
the first:
>>> def doit1(): ... import string ... string.lower('Python') ... >>> import string >>> def doit2(): ... string.lower('Python') ... >>> import timeit >>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()') >>> t.timeit() 11.479144930839539 >>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()') >>> t.timeit() 4.6661689281463623
String methods were introduced to the language in Python 2.0. These provide a version that avoids the import completely and runs even faster:
def doit3(): 'Python'.lower() for num in range(100000): doit3()
Here's the proof from timeit
:
>>> def doit3(): ... 'Python'.lower() ... >>> t = timeit.Timer(setup='from __main__ import doit3', stmt='doit3()') >>> t.timeit() 2.5606080293655396
The above example is obviously a bit contrived, but the general principle holds.
map
with Dictionaries I found it frustrating that to use map
to eliminate simple for
loops like:
dict = {} nil = [] for s in list: dict[s] = nil
I had to use a lambda
form or define a named function
that would probably negate any speedup I was getting by using
map
in the first place. I decided I needed some functions to
allow me to set, get or delete dictionary keys and values en masse. I
proposed a change to Python's dictionary object and used it for awhile.
However, a more general solution appears in the form of the
operator
module in Python 1.4. Suppose you have a list and you
want to eliminate its duplicates (ignoring the presence of set objects new
in Python 2.3). Instead of the code above, you can execute:
dict = {} map(operator.setitem, [dict]*len(list), list, []) list = statedict.keys()This moves the for loop into C where it executes much faster.
Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates. Here's a contrived example written in Python.
import time x = 0 def doit1(i): global x x = x + i list = range(100000) t = time.time() for i in list: doit1(i) print "%.3f" % (time.time()-t)
vs.
import time x = 0 def doit2(list): global x for i in list: x = x + i list = range(100000) t = time.time() doit2(list) print "%.3f" % (time.time()-t)
Here's the proof in the pudding using an interactive session:
>>> t = time.time() >>> doit2(list) >>> print "%.3f" % (time.time()-t) 0.204 >>> t = time.time() >>> for i in list: ... doit1(i) ... >>> print "%.3f" % (time.time()-t) 0.758
Even written in Python, the second example runs about four times faster
than the first. Had doit
been written in C the difference
would likely have been even greater (exchanging a Python for
loop for a C for
loop as well as removing most of the function
calls).
The Python interpreter performs some periodic checks. In particular, it
decides whether or not to let another thread run and whether or not to run a
pending call (typically a call established by a signal handler). Most of the
time there's nothing to do, so performing these checks each pass around the
interpreter loop can slow things down. There is a function in the
sys
module, setcheckinterval
, which you can call
to tell the interpreter how often to perform these periodic checks. Prior
to the release of Python 2.3 it defaulted to 10. In 2.3 this was raised to
100. If you aren't running with threads and you don't expect to be catching
many signals, setting this to a larger value can improve the interpreter's
performance, sometimes substantially.
It is also not Perl, Java, C++ or Haskell. Be careful when transferring your knowledge of how other languages perform to Python. A simple example serves to demonstrate:
% timeit.py -s 'x = 47' 'x * 2' 1000000 loops, best of 3: 0.574 usec per loop % timeit.py -s 'x = 47' 'x << 1' 1000000 loops, best of 3: 0.524 usec per loop % timeit.py -s 'x = 47' 'x + x' 1000000 loops, best of 3: 0.382 usec per loop
Now consider the similar C programs (only the add version is shown):
#include <stdio.h> int main (int argc, char **argv) { int i = 47; int loop; for (loop=0; loop<500000000; loop++) i + i; }
and the execution times:
% for prog in mult add shift ; do < for i in 1 2 3 ; do < echo -n "$prog: " < /usr/bin/time ./$prog < done < echo < done mult: 6.12 real 5.64 user 0.01 sys mult: 6.08 real 5.50 user 0.04 sys mult: 6.10 real 5.45 user 0.03 sys add: 6.07 real 5.54 user 0.00 sys add: 6.08 real 5.60 user 0.00 sys add: 6.07 real 5.58 user 0.01 sys shift: 6.09 real 5.55 user 0.01 sys shift: 6.10 real 5.62 user 0.01 sys shift: 6.06 real 5.50 user 0.01 sys
Note that there is a significant advantage in Python to adding a number to itself instead of multiplying it by two or shifting it left by one bit. In C on all modern computer architectures, each of the three arithmetic operations are translated into a single machine instruction which executes in one cycle, so it doesn't really matter which one you choose.
A common "test" new Python programmers often perform is to translate the common Perl idiom
while (<>) { print; }
into Python code that looks something like
#!/usr/bin/env python import fileinput for line in fileinput.input(): print line,
and use it to conclude that Python must be much slower than Perl. As others have pointed out numerous times, Python is slower than Perl for some things and faster for others. Relative performance also often depends on your experience with the two languages.
Last modified: Fri Mar 26 09:08:18 CST 2004
::...或是邮件反馈可也:
askdama[AT]googlegroups.com
订阅 substack 体验古早写作:
关注公众号, 持续获得相关各种嗯哼: