Download Performance Comparison of Different Exponentiation Methods in Python and more Summaries Advanced Computer Programming in PDF only on Docsity!
Timing Tests
Expression Disassembly
Multiplication
math.pow()
Exponentiation
BINARY_MULTIPLY versus BINARY_POWER
BINARY_MULTIPLY
BINARY_POWER
Charting Performance Differences
Generating Functions
math.pow() and Exponentiation
Chained Multiplication
Finding the Crossover
Charting the Performance
More Performance Testing
Conclusions
Recently, I was writing an algorithm to solve a coding challenge that involved
finding a point in a Cartesian plane that had the minimum distance from all of the
other points. In Python, the distance function would be expressed as
math.sqrt(dx ** 2 + dy ** 2). However, there are several different ways to
express each term: dx ** 2 , math.pow(dx, 2) , and dx * dx. Interestingly,
these all perform differently, and I wanted to understand how and why.
Timing Tests
Python provides a module called timeit to test performance, which makes testing
these timings rather simple. With x set to 2 , we can run timing tests on all three of
our options above:
Expression Timing (100k iterations) x * x 3.87 ms x ** 2 80.97 ms
math.pow(x, 2) 83.60 ms
Expression Disassembly
Python also provides a model called dis that disassembles code so we can see
what each of these expressions are doing under the hood, which helps in
understanding the performance differences.
Multiplication
Using dis.dis(lambda x: x * x) , we can see that the following code gets
executed:
0 LOAD_FAST 0 (x) 2 LOAD_FAST 0 (x) 4 BINARY_MULTIPLY 6 RETURN_VALUE
The program loads x twice, runs BINARY_MULTIPLY , and return s the value.
math.pow()
Using dis.dis(lambda x: math.pow(x, 2)) , we can see the following code gets
executed:
0 LOAD_GLOBAL 0 (math) 2 LOAD_ATTR 1 (pow) 4 LOAD_FAST 0 (x) 6 LOAD_CONST 1 (2) 8 CALL_FUNCTION 2 10 RETURN_VALUE
The math module loads from the global space, and then the pow attribute loads.
Next, both arguments are loaded and the pow function is called, which return s
the value.
if (((Py_SIZE(a) ^ Py_SIZE(b)) < 0) && z) { _PyLong_Negate(&z); if (z == NULL) return NULL; } return (PyObject *)z; }
For small numbers, this uses binary multiplication. For larger values, the function
uses Karatsuba multiplication, which is a fast multiplication algorithm for larger
numbers.
We can see how this function gets called in ceval.c :
case TARGET(BINARY_MULTIPLY): { PyObject *right = POP(); PyObject *left = TOP(); PyObject *res = PyNumber_Multiply(left, right); Py_DECREF(left); Py_DECREF(right); SET_TOP(res); if (res == NULL) goto error; DISPATCH(); }
BINARY_POWER
This function is located here in the Python source code. It also does several
interesting things:
The source code is too long to fully include, which partially explains the detrimental
performance. Here are some interesting snippets:
if (Py_SIZE(b) < 0) { /* if exponent is negative */ if (c) { PyErr_SetString(PyExc_ValueError, "pow() 2nd argument " "cannot be negative when 3rd argument
specified"); goto Error; } else { /* else return a float. This works because we know that this calls float_pow() which converts its arguments to double. */ Py_DECREF(a); Py_DECREF(b); return PyFloat_Type.tp_as_number->nb_power(v, w, x); } }
After creating some pointers, the function checks if the power given is a float or is
negative, where it either errors or calls a different function to handle
exponentiation.
If neither cases hit, it checks for a third argument, which is always None according
to ceval.c :
case TARGET(BINARY_POWER): { PyObject *exp = POP(); PyObject *base = TOP(); PyObject *res = PyNumber_Power(base, exp, Py_None); Py_DECREF(base); Py_DECREF(exp); SET_TOP(res); if (res == NULL) goto error; DISPATCH(); }
Finally, the function defines two routines: REDUCE for modular reduction and MULT
for multiplication and reduction. The multiplication function uses long_mul for
both values, which is the same function used in BINARY_MULTIPLY.
#define REDUCE(X)
1
We can use the timeit library above to profile code at different values and see
how the performance changes over time.
Generating Functions
To test the performance at different power values, we need to generate some
functions.
math.pow() and Exponentiation
Since both of these are already in the Python source, all we need to do is define a
function for exponentiation we can call from inside a timeit call:
exponent = lambda base, power: base ** power
Chained Multiplication
Since this changes each time the power changes , we need to generate a new
multiplication function each time the base changes. To do this, we can generate a
string like xxx and call eval() on it to return a function:
def generate_mult_func(n): mult_steps = '*'.join(['q'] * n) func_string = f'lambda q: {mult_steps}' # Keep this so we can print later return eval(func_string), func_string
Thus, we can make a multiply function like so:
multiply, func_string = generate_mult_func(power)
If we call generate_mult_func(4) , multiply will be a lambda function that
looks like this:
lambda q: qqq*q 3
Finding the Crossover
Using the code posted here, we can determine at what point multiply becomes
less efficient than exponent.
Staring with these values:
base = 2 power = 2
We loop until the time it takes to execute 100,000 iterations of multiply is
slower than executing 100,000 iterations of exponent. Initially, here are the
timings, with math.pow() serving as a point of comparison:
Starting speeds: Multiply time 11.83 ms Exponent time 86.52 ms math.pow time 73.90 ms
When running on repl.it, Python finds the crossover in 1.2s:
Crossover found in 1.2 s: Base, power 2, 15 Multiply time 110.09 ms Exponent time 108.20 ms math.pow time 79.82 ms Multiply func lambda q: qqqqqqqqqqqqqqq
Thus, chaining multiplication together is faster until our expression gets to 2^14 ; at
2^15 exponentiation becomes faster.
Charting the Performance
Using Pandas, we can keep track of the timing at each power:
Power multiply exponent math.pow
Interestingly, math.pow() and exponent mostly perform at the same rate, while
our multiply functions vary wildly. Unsurprisingly, the longer the multiplication
chain, the longer it takes to execute.
More Performance Testing
While the crossover is interesting, this doesn’t show what happens at powers larger
than 15. Going up through 1000 , we get the following trend:
When we zoom in so that math.pow() and exponent are more pronounced, we
see the same performance trend continue:
While using ** the time gradually increases, math.pow() generally has executes
at around the same speed.
Conclusions
When writing algorithms that use small exponents, here proved less than 15 , it is
faster to chain multiplication together than to use the ** exponentiation operator.
Additionally, math.pow() is more efficient than chained multiplication at powers
larger than 10 and always more efficient than the ** operator, so there is never a
reason to use **.
Additionally, this is also true in JavaScript. Thanks @julaincoleman for this
comparison!
Discussion: <a
href="https://www.reddit.com/r/Python/comments/bv1ez2/performance_of_variou
s_python_exponentiation/“>r/Python, Hacker News | View as: PDF, Markdown
4