Coding: Bring the Noise

%26lt;0011 Em

Digit counting in Python

leave a comment »

Question: what’s the fastest way to count the number of digits (base 10) in an integer (integral type, could be long) in python?

This came up when I was working on a ProjectEuler problem.  There are three options I came up with, and we can rule the first out immediately:

  1. Divide (as an integer) the number by ten and keep a counter of how many times we divided it, until the number is 0.
  2. Use log(n, 10) and properly account for floating point black magic
  3. Convert to a string, get the length

Approach: log(n, 10)

This one’s a bit tricky.  Observe:


>>> log(999,10)
2.9995654882259823
>>> log(1000,10)
2.9999999999999996

And to get an integer, I figured:

>>> 1 + int(log(999,10))
3
>>> 1 + int(log(1000,10))
3

Nope.  But I’ve played with log before, and the answer seems to be “eh, use an offset.”  Of course as the number grows, the closeness of 999999999999 etc to 1000000000000 will soon overtake the precision an arbitrary offset.  Skipping over a bunch of trial and error since I was too lazy to reason the math, I ended up using floor(log10(n + 0.5)) + 1.  The offset is applied before taking the log, so FPE should dominate log10(0.5) for large n.  Also, use floor and not int.

Approach: convert to string and get the length

len(str(n)).  Yep, that’s it.

Testing and Data:

The two functions:


def ndigits_log(n):
    return 1 + floor(log10(n + 0.5))

def ndigits_str(n):
    return len(str(n))

As for the pictures, some notes:

1) timeit recommends using the min of runs, not the avg or max.  The first graph is min, the second graph is max.

2) I kept gc on, since I wasn’t sure how that’d impact things, but it was safe to assume it would be on during normal use.

X axis: number of digits

Y axis: ms to calculate (per 10,000 runs)

Each data point is an average of 50 runs, 10,000 calculations per run.

Minimum calculation time y (in ms) for 10,000 runs on a number with x digits

String starts off great, and builds steadily.  log10 starts off a bit behind, and there’s a spike around 11 digits where it jumps to ~63ms, but it overtakes string around 23 digits.

Maximum calculation time y (in ms) for 10,000 runs on a number with x digits

Here we see roughly the same behavior, but, again, as max isn’t a reliable measure, it’s mostly noise.  On the whole, if we pulled anything from this, it would be that string can fall behind by ~100ms, while log10’s worst difference was ~35ms.  But this is really only here because graphs are fun and it’s colorful.

Conclusions:

My original interest in the question was because of Project Euler Problem 35.  And to that end, len(str(n)) wins out convincingly.  Which confuses me- strings are really that much better?  At least, for small numbers.  Which means that either the str(Numeric) conversion is ff’d for large values, or the actual number isn’t an int anymore, and that cast takes longer.  For problem 35, you’re only looking at numbers with 5 or less digits- and therefore, to my surprise, len(str(n)) is much better.


Update

I was right about the precision eventually becoming a problem- log10(9999999999999999)- sixteen 9s- is 16.0.

Using Wolfram Alpha, we get 15.999999999999999956570551809674815063414698592080208750088…

If we had that precision, it would have been 15 when floored, and then + 1 gives us the correct 16.  However, we can’t just drop the +1 for values greater than 16 9s, because 1+floor(log10(1E16)) should be 17- if we drop the 1+, we’re down for every number after 1E16 – 1 until we get to 1E17 – 1, when we hit the same problem.  So I tried a hybrid method- If the number was above 1E16 – 1, use len(str)- otherwise use 1+floor(log10(n)).  However, this seems to have destroyed the performance.

First, a method that doesn’t use the 0.5 offset was a bit faster.  Here are the results:

Minimum calculation time y (in ms) for 10,000 runs on a number with x digits

Combined method code:

def method_combined(n):
    if n > 9999999999999999:
        return len(str(n))
    else:
        return 1 + floor(log10(n))

Results: 10 runs, 10,000 calculations per run:

Minimum calculation time y (in ms) for 10,000 runs on a number with x digits

So the combined method, eh.  Not really worth it.

Updated conclusions:

I’m going to stick with 1+floor(log(n)) and leave a note in the method that it’s inaccurate for values close to but less than 1Ex for x >= 16.

Advertisements

Written by delwinna

January 27, 2012 at 2:40 am

Posted in Puzzles, Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: