Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Performance Comparison of Different Exponentiation Methods in Python, Summaries of Advanced Computer Programming

The performance differences between various ways of expressing exponentiation in Python, including multiplication and the math.pow() function. The author uses Python's timeit and disassemblers to analyze the bytecode and timing of each method. The document also discusses the implementation of Python's BINARY_MULTIPLY and BINARY_POWER functions and their performance for small and large numbers.

What you will learn

  • How does Python's BINARY_MULTIPLY and BINARY_POWER functions differ in their implementation and performance?
  • How does the performance of exponentiation in Python change as the power value increases?
  • What is the role of Python's timeit and disassemblers in analyzing the performance of different Python expressions?
  • At what point does chained multiplication become less efficient than exponentiation in Python?

Typology: Summaries

2021/2022

Uploaded on 09/27/2022

queenmary
queenmary 🇬🇧

4.6

(15)

218 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Timing Tests
Expression Disassembly
Multiplication
math.pow()
Exponentiation
BINARY_MULTIPLY versus BINARY_POWER
BINARY_MULTIPLY
BINARY_POWER
Charting Performance Differences
Generating Functions
math.pow() and Exponentiation
Chained Multiplication
Finding the Crossover
Charting the Performance
More Performance Testing
Conclusions
Recently, I was writing an algorithm to solve a coding challenge that involved
finding a point in a Cartesian plane that had the minimum distance from all of the
other points. In Python, the distance function would be expressed as
math.sqrt(dx ** 2 + dy ** 2) . However, there are several different ways to
express each term: dx ** 2 , math.pow(dx, 2) , and dx * dx . Interestingly,
these all perform differently, and I wanted to understand how and why.
Timing Tests
Python provides a module called timeit to test performance, which makes testing
these timings rather simple. With x set to 2, we can run timing tests on all three of
our options above:
Expression Timing (100k iterations)
x * x 3.87 ms
x ** 2 80.97 ms
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Performance Comparison of Different Exponentiation Methods in Python and more Summaries Advanced Computer Programming in PDF only on Docsity!

Timing Tests

Expression Disassembly

Multiplication

math.pow()

Exponentiation

BINARY_MULTIPLY versus BINARY_POWER

BINARY_MULTIPLY

BINARY_POWER

Charting Performance Differences

Generating Functions

math.pow() and Exponentiation

Chained Multiplication

Finding the Crossover

Charting the Performance

More Performance Testing

Conclusions

Recently, I was writing an algorithm to solve a coding challenge that involved

finding a point in a Cartesian plane that had the minimum distance from all of the

other points. In Python, the distance function would be expressed as

math.sqrt(dx ** 2 + dy ** 2). However, there are several different ways to

express each term: dx ** 2 , math.pow(dx, 2) , and dx * dx. Interestingly,

these all perform differently, and I wanted to understand how and why.

Timing Tests

Python provides a module called timeit to test performance, which makes testing

these timings rather simple. With x set to 2 , we can run timing tests on all three of

our options above:

Expression Timing (100k iterations) x * x 3.87 ms x ** 2 80.97 ms

math.pow(x, 2) 83.60 ms

Expression Disassembly

Python also provides a model called dis that disassembles code so we can see

what each of these expressions are doing under the hood, which helps in

understanding the performance differences.

Multiplication

Using dis.dis(lambda x: x * x) , we can see that the following code gets

executed:

0 LOAD_FAST 0 (x) 2 LOAD_FAST 0 (x) 4 BINARY_MULTIPLY 6 RETURN_VALUE

The program loads x twice, runs BINARY_MULTIPLY , and return s the value.

math.pow()

Using dis.dis(lambda x: math.pow(x, 2)) , we can see the following code gets

executed:

0 LOAD_GLOBAL 0 (math) 2 LOAD_ATTR 1 (pow) 4 LOAD_FAST 0 (x) 6 LOAD_CONST 1 (2) 8 CALL_FUNCTION 2 10 RETURN_VALUE

The math module loads from the global space, and then the pow attribute loads.

Next, both arguments are loaded and the pow function is called, which return s

the value.

if (((Py_SIZE(a) ^ Py_SIZE(b)) < 0) && z) { _PyLong_Negate(&z); if (z == NULL) return NULL; } return (PyObject *)z; }

For small numbers, this uses binary multiplication. For larger values, the function

uses Karatsuba multiplication, which is a fast multiplication algorithm for larger

numbers.

We can see how this function gets called in ceval.c :

case TARGET(BINARY_MULTIPLY): { PyObject *right = POP(); PyObject *left = TOP(); PyObject *res = PyNumber_Multiply(left, right); Py_DECREF(left); Py_DECREF(right); SET_TOP(res); if (res == NULL) goto error; DISPATCH(); }

BINARY_POWER

This function is located here in the Python source code. It also does several

interesting things:

The source code is too long to fully include, which partially explains the detrimental

performance. Here are some interesting snippets:

if (Py_SIZE(b) < 0) { /* if exponent is negative */ if (c) { PyErr_SetString(PyExc_ValueError, "pow() 2nd argument " "cannot be negative when 3rd argument

specified"); goto Error; } else { /* else return a float. This works because we know that this calls float_pow() which converts its arguments to double. */ Py_DECREF(a); Py_DECREF(b); return PyFloat_Type.tp_as_number->nb_power(v, w, x); } }

After creating some pointers, the function checks if the power given is a float or is

negative, where it either errors or calls a different function to handle

exponentiation.

If neither cases hit, it checks for a third argument, which is always None according

to ceval.c :

case TARGET(BINARY_POWER): { PyObject *exp = POP(); PyObject *base = TOP(); PyObject *res = PyNumber_Power(base, exp, Py_None); Py_DECREF(base); Py_DECREF(exp); SET_TOP(res); if (res == NULL) goto error; DISPATCH(); }

Finally, the function defines two routines: REDUCE for modular reduction and MULT

for multiplication and reduction. The multiplication function uses long_mul for

both values, which is the same function used in BINARY_MULTIPLY.

#define REDUCE(X)
1

We can use the timeit library above to profile code at different values and see

how the performance changes over time.

Generating Functions

To test the performance at different power values, we need to generate some

functions.

math.pow() and Exponentiation

Since both of these are already in the Python source, all we need to do is define a

function for exponentiation we can call from inside a timeit call:

exponent = lambda base, power: base ** power

Chained Multiplication

Since this changes each time the power changes , we need to generate a new

multiplication function each time the base changes. To do this, we can generate a

string like xxx and call eval() on it to return a function:

def generate_mult_func(n): mult_steps = '*'.join(['q'] * n) func_string = f'lambda q: {mult_steps}' # Keep this so we can print later return eval(func_string), func_string

Thus, we can make a multiply function like so:

multiply, func_string = generate_mult_func(power)

If we call generate_mult_func(4) , multiply will be a lambda function that

looks like this:

lambda q: qqq*q 3

Finding the Crossover

Using the code posted here, we can determine at what point multiply becomes

less efficient than exponent.

Staring with these values:

base = 2 power = 2

We loop until the time it takes to execute 100,000 iterations of multiply is

slower than executing 100,000 iterations of exponent. Initially, here are the

timings, with math.pow() serving as a point of comparison:

Starting speeds: Multiply time 11.83 ms Exponent time 86.52 ms math.pow time 73.90 ms

When running on repl.it, Python finds the crossover in 1.2s:

Crossover found in 1.2 s: Base, power 2, 15 Multiply time 110.09 ms Exponent time 108.20 ms math.pow time 79.82 ms Multiply func lambda q: qqqqqqqqqqqqqqq

Thus, chaining multiplication together is faster until our expression gets to 2^14 ; at

2^15 exponentiation becomes faster.

Charting the Performance

Using Pandas, we can keep track of the timing at each power:

Power multiply exponent math.pow

Interestingly, math.pow() and exponent mostly perform at the same rate, while

our multiply functions vary wildly. Unsurprisingly, the longer the multiplication

chain, the longer it takes to execute.

More Performance Testing

While the crossover is interesting, this doesn’t show what happens at powers larger

than 15. Going up through 1000 , we get the following trend:

When we zoom in so that math.pow() and exponent are more pronounced, we

see the same performance trend continue:

While using ** the time gradually increases, math.pow() generally has executes

at around the same speed.

Conclusions

When writing algorithms that use small exponents, here proved less than 15 , it is

faster to chain multiplication together than to use the ** exponentiation operator.

Additionally, math.pow() is more efficient than chained multiplication at powers

larger than 10 and always more efficient than the ** operator, so there is never a

reason to use **.

Additionally, this is also true in JavaScript. Thanks @julaincoleman for this

comparison!

Discussion: <a

href="https://www.reddit.com/r/Python/comments/bv1ez2/performance_of_variou

s_python_exponentiation/“>r/Python, Hacker News | View as: PDF, Markdown

4