Modular Exponentiation in Python
Python is my favorite programming/scripting language for mathematics, and there has been extensive work in applying it to various fields of study. For this post, I will apply it to modular exponentiation, also known as "power modulus", and I use it in primality tests. There have been many implementations of the function  some of which may be seen on the linked Wikipedia page.
Implementations
I'll denote the function as \(expmod\), and it can be expressed as:
where a, b, and c are all of integral value.
After some quick googling, I found three efficient implementations of calculating \(expmod\):

Python's builtin
pow
function 
Simple modular multiplication

Iterative modular binary exponentiation

Recursive modular binary exponentation
The reason why such "clever" and more optimized approaches are necessary is because naive methods, like the one listed below, simply take too much memory.
def expmod(a,b,c):
return (a**b)%c
They first calculate \(a^{b}\), and then they take the modulus of the result, with both being extremely memorydemanding operations when used with large numbers. To get around this, the first step would be to exploit the following property of modular multiplication:
In plain English, we would say to take the modulus of a and b, then multiply them together, and take the modulus of the product.
Applying this to modular exponentiation yields the following:
# Modular exponentiation with simple modular multiplication
def expmod(a,b,c):
x = 1;
for i in xrange(0,b):
x = (x*a)%c
return x
Computers, however, operate on binary numbers, not base10 numbers that humans use. This is when binary exponentiation comes into use.
We do this by multiplying the temporary result by a when b is odd. When b is even, square the base a, and keep repeating these steps. After each multiplication operation, take the temporary value mod 'c'. Taken from Wikipedia, the recursive equation can be formally expressed as:
The following Python function is an iterative implementation of modular binary exponentiation:
# Modular exponentiation with iterative binary exponentiation
def expmod_iter(a,b,c):
x = 1
while(b>0):
if(b&1==1): x = (x*a)%c
a=(a*a)%c
b >>= 1
return x%c
The primary reason I included both iterative and recursive implementations of binary exponentiation in the beginning of the post is that Zhihua Lai's page listed that using recursion was faster than using iterations (though just slightly). This strange result actually pushed me to make some of this post.
Translating the iterative function above into a recursive function (which Zhihua Lai claimed was slightly faster) yields the following code:
# Modular exponentiation with recursive binary exponentiation
def expmod_recur(a,b,c):
if(b==1): return a%c
x = expmod_recur(a,b>>1,c)
x = (x*x)%c
if(b&1==1)==1: # if odd number
x = (x*a)%c
return x
Runtime
Ultimately, algorithms need to be fast in terms of runtime  not just number of steps. Luckily, Python provides the timeit
module for easily timing repeated operations.
Below, you can see the code I use for doing a simple benchmark for each of the four algorithms above:
# Test speeds of the four implementations
a = 153
b = 99999999
c = 147
setup_vars = "a = %d;b = %d;c = %d" % (a,b,c)
iterations = 100
print "Builtin"
t1 = timeit.Timer("pow(a,b,c)",setup_vars)
print t1.timeit(iterations)
print "Multiplicative"
t2 = timeit.Timer("expmod(a,b,c)",setup_vars+"; from __main__ import expmod")
print t2.timeit(iterations)
print "Binary: iterative"
t3 = timeit.Timer("expmod_iter(a,b,c)",setup_vars+"; from __main__ import expmod_iter")
print t3.timeit(iterations)
print "Binary: recursive"
t4 = timeit.Timer("expmod_recur(a,b,c)",setup_vars+"; from __main__ import expmod_recur")
print t4.timeit(iterations)
I ran each of the four algorithms 100 times for a = 153
, b = 99999999
, and c = 147
. a and c were chosen arbitrarily, and b was simply set to a large number.
This is the output of the program, with the algorithm name above the total runtime (in seconds):
Builtin
0.000188827514648
Multiplicative
1490.67567086
Binary: iterative
0.0010290145874
Binary: recursive
0.00142788887024
As you can tell, the simple multiplicative method is significantly slower than the other three methods  nearly 20 minutes compared to hundreths of a second. Since a and c, in particular, were chosen arbitrarily, I decided to rerun the three fastest implementations (multiplicative was excrutiatingly slow) for a range of numbers. In mathematical notation,
. Every combination of a and c within this range was tried.
# Test speeds of three implementations over a range of inputs
def wrapper(func, start, end, b):
def wrapped():
for a in xrange(start,end):
for c in xrange(start,end):
func(a,b,c)
return wrapped
# snip
b = 99999999
testrange = [2,1000]
iterations = 5
t1 = timeit.Timer(wrapper(pow, testrange[0], testrange[1], b))
t2 = timeit.Timer(wrapper(expmod, testrange[0], testrange[1], b))
t3 = timeit.Timer(wrapper(expmod_iter, testrange[0], testrange[1], b))
t4 = timeit.Timer(wrapper(expmod_recur, testrange[0], testrange[1], b))
print "Builtin"
print t1.timeit(iterations)
print "Binary: iterative"
print t3.timeit(iterations)
print "Binary: recursive"
print t4.timeit(iterations)
The results are as follows (again, in seconds):
Builtin
9.57767701149
Binary: iterative
58.8572740555
Binary: recursive
92.6197078228
After looking at these results, you may be wondering: how is the builtin function so fast? What algorithm is there that could be quicker than binary exponentiation?
Well, the answer is that there really isn't a better algorithm. On a stackoverflow post, it was pointed out that the pow()
function actually points to underlying C libraries, and that the long_pow()
method for longs actually uses iterative binary exponentiation.
Conclusion
From the results above, it's very easy to tell that exponentiation using simple modular multiplication will not suffice for programs needing efficiency. Instead, binary exponentiation can be employed to greatly speed up the process.
As I mentioned earlier, one of the reasons I wrote this post was to validate Mr. Lai's claim that the recursive method performed better than the iterative version, and that was proved to be false (by quite a significant degree).
Out of the four implementations of modular exponentiation, the builtin pow()
method was observed as the fastest, as it is a C implementation of iterative binary exponentation.
References
 Wikipedia article on modular exponentiation
 Python Central article "Time a Python Fuction"
 Zhihua Lai's post Compute PowerMod a^b%n