## Digit counting in Python

Question: what’s the fastest way to count the number of digits (base 10) in an integer (integral type, could be long) in python?

This came up when I was working on a ProjectEuler problem. There are three options I came up with, and we can rule the first out immediately:

- Divide (as an integer) the number by ten and keep a counter of how many times we divided it, until the number is 0.
- Use log(n, 10) and properly account for floating point black magic
- Convert to a string, get the length

## Approach: log(n, 10)

This one’s a bit tricky. Observe:

>>> log(999,10) 2.9995654882259823 >>> log(1000,10) 2.9999999999999996

And to get an integer, I figured:

>>> 1 + int(log(999,10)) 3 >>> 1 + int(log(1000,10)) 3

Nope. But I’ve played with log before, and the answer seems to be “eh, use an offset.” Of course as the number grows, the closeness of 999999999999 etc to 1000000000000 will soon overtake the precision an arbitrary offset. Skipping over a bunch of trial and error since I was too lazy to reason the math, I ended up using floor(log10(n + 0.5)) + 1. The offset is applied before taking the log, so FPE should dominate log10(0.5) for large n. Also, use floor and not int.

## Approach: convert to string and get the length

len(str(n)). Yep, that’s it.

## Testing and Data:

The two functions:

def ndigits_log(n): return 1 + floor(log10(n + 0.5)) def ndigits_str(n): return len(str(n))

As for the pictures, some notes:

1) timeit recommends using the min of runs, not the avg or max. The first graph is min, the second graph is max.

2) I kept gc on, since I wasn’t sure how that’d impact things, but it was safe to assume it would be on during normal use.

X axis: number of digits

Y axis: ms to calculate (per 10,000 runs)

Each data point is an average of 50 runs, 10,000 calculations per run.

String starts off great, and builds steadily. log10 starts off a bit behind, and there’s a spike around 11 digits where it jumps to ~63ms, but it overtakes string around 23 digits.

Here we see roughly the same behavior, but, again, as max isn’t a reliable measure, it’s mostly noise. On the whole, if we pulled anything from this, it would be that string can fall behind by ~100ms, while log10’s worst difference was ~35ms. But this is really only here because graphs are fun and it’s colorful.

## Conclusions:

My original interest in the question was because of Project Euler Problem 35. And to that end, len(str(n)) wins out convincingly. Which confuses me- strings are really that much better? At least, for small numbers. Which means that either the str(Numeric) conversion is ff’d for large values, or the actual number isn’t an int anymore, and *that* cast takes longer. For problem 35, you’re only looking at numbers with 5 or less digits- and therefore, to my surprise, len(str(n)) is much better.

## Update

I was right about the precision eventually becoming a problem- log10(9999999999999999)- sixteen 9s- is 16.0.

Using Wolfram Alpha, we get 15.999999999999999956570551809674815063414698592080208750088…

If we had that precision, it would have been 15 when floored, and then + 1 gives us the correct 16. However, we can’t just drop the +1 for values greater than 16 9s, because 1+floor(log10(1E16)) should be 17- if we drop the 1+, we’re down for every number after 1E16 – 1 until we get to 1E17 – 1, when we hit the same problem. So I tried a hybrid method- If the number was above 1E16 – 1, use len(str)- otherwise use 1+floor(log10(n)). However, this seems to have destroyed the performance.

First, a method that doesn’t use the 0.5 offset was a bit faster. Here are the results:

Combined method code:

def method_combined(n): if n > 9999999999999999: return len(str(n)) else: return 1 + floor(log10(n))

Results: 10 runs, 10,000 calculations per run:

So the combined method, eh. Not really worth it.

## Updated conclusions:

I’m going to stick with 1+floor(log(n)) and leave a note in the method that it’s inaccurate for values close to but less than 1Ex for x >= 16.

## Leave a Reply