Number 1 and Benford’s Law – Numberphile
Articles,  Blog

Number 1 and Benford’s Law – Numberphile


STEVE MOULD: This is
Benford’s Law. And it’s about numbers, but it’s
about the leading digit. For example, you could look at
the populations of all the countries in the world and
look at the leading digits of all those. So for example, if it was 1,269,
then the leading digit in that case is the one. Benford’s Law works on a
distribution of numbers if that distribution spans quite
a few orders of magnitude. And the brilliant thing about
populations of countries is that it actually goes from
tens up to billions. If you were to think about
that, OK, what are the distribution of leading
digits. So some of the populations will
start with the one, some will start with two, three,
four, five, six, seven, eight, or nine. And so there are nine possible
leading digits. And you might imagine that each
one of those possible leading digits are equally
likely to appear. So that’s one in nine– 11%. And if I was to plot that on a
graph, you might expect that to fluctuate around 11%. So it’s going to go like that. So what actually happens is
that a third of the time, that’s up here. A third of the time the
number you choose will start with a one. And it will hardly ever
start with a nine. So nine is down here– tiny number. And then you get this
brilliant curve that goes up like that. Isn’t that crazy? BRADY HARAN: I know you talk
about this sometimes in talks and things you do. What’s the reaction to
that normally when you tell people this? STEVE MOULD: The reaction? The noise is sort
of like this– ohh. And there’s a certain
amount of disbelief sometimes as well. And the way we do it actually
in the show is that we get people to tweet numbers to us. So we’re collecting numbers, and
I try to give them ideas. So maybe like, take the distance
from the venue to where they live and convert that
into some strange units. Or something like that. The interesting thing is, like I
was saying, it works so long as the distribution you’re
choosing from spans loads of orders of magnitude. But if you’re picking numbers
from lots of different distributions, the individual
distributions don’t have to span lots of orders
of magnitude. The meta-distribution of
individual things picked from different distributions follows
Benford’s Law anyway. So it works brilliantly well. BRADY HARAN: What clump
of numbers will this not work for? STEVE MOULD: Human
height in meters. So humans are between one
meter and three meters. So it doesn’t work for that. You get a massive load
around there. And no one’s nine meters tall. Anything that has that short
distribution, it doesn’t work for. But it does work for several
distributions put together that don’t necessarily
individually follow the rule. So I did it for populations. I did it for areas of countries in kilometers squared. If you take one number and
convert it to loads of different units, that
will tend to follow Benford’s Law as well. You can do it for the
Financial Times. Look at all the numbers
on the front cover of the Financial Times. They will tend to follow
Benford’s Law as well. BRADY HARAN: Just a quick
interjection– you can also apply this to the
number of times you watch Numberphile videos or leave
comments underneath. More information at the
end of the video. STEVE MOULD: So the explanation
is to do with scale invariance, which
I’m just getting my head around now. But there are a couple
of intuitive ways of understanding it. One of them is to use the
idea of a raffle. To begin with, it’s a
very small raffle. So there are only two tickets
in this raffle. What are the chances of the
winning ticket in this raffle having a leading digit of one? Well, that’s this one. So it’s one in two. It’s 50%. But then if you increase the
size of the raffle, so there are now three tickets in the
raffle, the chance now are one in three or about 33%. If you add a fourth ticket, then
the probability of the leading digit of the winning
ticket being a one is now 25%, and then 20%, and so on and so
on until you have a raffle with nine tickets in it. And then the probability of the
winning ticket having a leading digit of one
is one in nine. It’s 11%, which was the
intuitive thing that you might think. But then you add your
tenth ticket. And now there are two tickets
that start with a one. So now the probability
is 2 in 10 or 1 in 5. So it would go back up to 20%. The probability will go up, and
up, and up as you add more tickets that start with a one. And once you have a raffle with
19 tickets in it, you’re up to something like 58%. And then you add the
20th ticket. And so the probability
goes down again. So the probability of the
winning ticket having a leading digit of one will go
down, and down, and down through the 20s. It will go down through the
40s, down through the 50s, 60s, 70s, 80s, 90s, until you
add the hundredth ticket. And then the probability will
start to go up again. And then the probability will go
up, and up, and up, all the way through the 100s. And then you get to the 200s,
and it goes down, and down, and down through all the 200s,
300s, 400s, 500s, 600s, 700s, 800s, 900s. And you’ll be back to
11% again then. Then you add the thousandth
ticket. And the probability will
start to go up again. So the probability goes up,
and up, and up through the thousands and then down through
the 2000s, 3000s, blah, blah, blah. And then you get to 10,000
and it goes up. And so basically the probability
of the winning ticket having any digit of one
fluctuates as the size of the raffle increases. And so this is a log scale of
the raffle increasing in size. So you might have a 10, 100,
1,000, 10,000, and so on. And then this is the probability
here of having a leading digit of one. It goes like that. What Frank Benford realized was
that if you pick a number from a distribution that spans
loads of orders of magnitude, or if you pick a number from
the world and you don’t necessarily know what the
distribution is in advance, then it’s like picking a ticket
from a raffle when you don’t know the size
of the raffle. So you have to take the average
of this wiggly line, which is what he did. So that’s the average there. And it’s around 30%. There’s a formula for it, which
is the probability of picking a number with a
particular leading digit of d is equal to log to base 10
of 1 plus 1/d, like that. And so that’s how you do it. And if you plug one into
there, then it’s log base 10 of two. And it ends up being
about 30%. The beauty is that you can
do it in any base. So this doesn’t have
to be base 10. It could be base five, base 16,
whatever you want to do. You can apply Benford’s Law
to different bases. This is a formula that a
forensic accountant would use as a tax formula of something
like that. If you’re making up numbers in
your accounts and the numbers you make up don’t follow
Benford’s Law, then that’s a clue that you might
be cheating. So this is a formula you need to
remember if you’re going to cheat on your tax return. BRADY HARAN: A lot of things
that mathematically inclined people like yourself tell me
when I hear about them seem counter-intuitive. And then you cleverly explain
why it works the way it works. This is one of the few things
that when I’ve heard about it, this just seems logical to me. When someone says one will come
up more often, to me that just seems like, of course
that would happen. STEVE MOULD: Yes. Funny isn’t it? Some people are like you. I would say you’re in the
minority of people that go, well, yeah. And I wonder if there is another
intuitive way of looking at it that you’ve tapped
into, which is that if you imagine something like
the NASDAQ index or something like that– and I don’t know what the NASDAQ
index is size-wise– but imagine that the NASDAQ
index is at 1,000. To change that to 2,000, you’d
have to double it. So the NASDAQ index would have
to increase by 100% to get from something that starts with
a one to something that starts with a two. So that’s quite a big change. But if the NASDAQ index was
on 9,000 and you wanted to increase it to 10,000, then
that’s an 11% increase. So it’s hardly anything. So basically, you don’t really
hang around the nines. As things are growing and
shrinking, you don’t hang around, whereas you do
hang around the ones. And maybe that’s intuitive
to you. So you’re like, yeah
obviously. BRADY HARAN: If you’d like to
see even more about Benford’s Law, we’ve done a bit of a
statistical analysis to find out whether or not your viewing
habits and the number of times you comment on
Numberphile videos is following the Benford curve. The link is below this video
or here on the screen. So why don’t you check it out?

100 Comments

  • Steven McKeating

    https://oeis.org/webcam Will probably show that the terms of random sequences on average will follow Benford's law.

  • Simon Chan

    This is very interesting but I have a question – if I was to buy a raffle ticket, would it be wiser for me to buy one thats starts with a one given Bedford's law if I assume that approximately 150 people in the office buys the raffle tickets?

  • Carbon Scythe

    So what does it mean if something doesn't follow Benford's Law? I know it can be used to find fraud but what else? Can you somehow use it to figure out the possibility that it is actually fraud that is going on?

  • Austin Liu

    Does Benford's Law remain true for different number bases? If you took data that conformed to Benford's law in Base 10, and converted it to Base 7, or Base 9, would it still conform to the law? What about higher bases?

  • Zack

    I feel that it happens Because count Zero goes notice , first round & the 1 Take the credit on every Cycle of Increment. Example : 0,1,2,3- 10,11,12,13
    just a humble opinion

  • Garrett Van Cleef

    County by county votes by state in 2016 US by election? What should does show? Please do a Numberphile video on this relative to Benford's Law! Great stuff!

  • AbstrctIgwana84

    Benford's Law makes sense, because when you go up a place, it starts with 1, so it gets the boost in percentage from the new digit first.

  • The Real Flenuan

    It makes sense once you visualise a logarithmic scale, where the 1s take up a third of the space and the other numbers gradually compress until the 9 is practically a tenth the size of the next 1 that will follow, etc.

  • Daniel Arrizza

    The way I reason about it is that when you stop counting, you're chopping off the later numbers, leaving you with more early numbers, especially 1.

  • Анатолий Петровский

    Hi! Here ( https://youtu.be/XXjlR2OK1kM?t=357 ) you try to describe Benford's law, but we can see the same image for 2, 3, 4, 5, 6…. and we can do the same calculations for other digits probabilities in number sequence and get the same conclusion, so I don't understand why this law works.

    Sorry for my english, this is not my native language 🙂

  • rbapf

    I tried the formula, and I noticed that, in base 2^n, the probability that a number will start with a 1 is 1/n. I guess it's just the reciprocal of logs in base 2.

  • dwaltrip77

    Another way of looking at it: Each of the numerals only have an equal shot at being the leading digit if the distribution stops RIGHT before the next power of 10. Some examples of this would be if the distribution was from 1 to 9, or from 1 to 99, or 1 to 999, etc.

    If there is any extra "spare change" added to the size of the distribution, such as 1 to 99 being increased so that it now goes from 1 to 135, then that means there are 35 extra spots in the distribution that start with 1 (i.e. 100, 101, 102… 134, 135). These new options now make the number 1 a much more likely choice to be the leading digit.

    Distributions never end nicely right before the next order of magnitude, as the point right between orders of magnitudes is just an arbitrary spot on the number line [1]. This means there is almost always some "spare change".

    As 1 is the first number, it is the most likely to be part of the "spare change". When we go from one order of magnitude to the next, we go from the 9's to the 1's (99 –> 100, 999 –> 1000, etc). So, if the distribution crosses multiple orders of magnitudes, 1 always has the best shot at being the leading digit.

    2 is next in line after we leave the 1's — 199 goes to 200, 1999 goes to 2000, etc. Thus, 2 has the 2nd best odds of being the leading digit. And then of course the same logic applies for 3, all the way down to 9 being the least likely.

    This gives us the shape of the graph in the video!

    [1] If we switch from base 10 to some other base, the points on the number line that mark the crossover from one order of magnitude to the next will all switch! But the number line itself hasn't changed — we are simply relabeling the positions.

  • Taylor Sabbag

    Why can't the same principle be applied to other numbers through 9? For example, 2s, 20s, 200s…etc, why wouldn't that be just as applicable?

  • AlxndrJG

    this doesn't work with memes…..

    "choose a number"
    (☞゚∀゚)☞

    "ITS OVER 9000 !!!"
    ( ಠ ͜ʖರೃ)

    "…"
    ༼ つ ಥ_ಥ ༽つ

  • Northfan42

    Of course 1 is more common. The simple fact of its closer proximity to the origin than any other single-digit integer means it will naturally occur more frequently as a leading digit than others. Every other leading digit is dependent on all 1-lead numbers preceding them before they can occur, regardless of the order of magnitude. Likewise, all numbers with the leading digit of 3 are dependent on being preceded by all 1-lead and 2-lead numbers and so on.

    That said, this is all dependent on the number set beginning at the origin and working its way up. What if an arbitrarily large number with all digits being 9s was the starting point and the number set counted down? At what point would Benford's Law cease to be inverted and take effect in normal fashion as explained here?

  • Alex

    OMG I got even more impressive results! I wanted to put this to the test, so I went around asking everyone I knew what year they were born, and, shockingly, 100% of the answers started with a 1!! I mean, wow! What could be causing that?

  • planksunit

    I noticed this a long time ago, I wonder how many other people figured this out and never bothered to write it all out let alone publish.

  • antiantiderivative

    Has anyone thought about looking at patterns in different bases? Maybe looking at the numbers in a different base will show some more interesting results.

  • Jim DeCamp

    I once read that someone noticed that in tables of logarithms in libraries, pages with lower numbers , exhibited more than higher numbers. This would explain it.

  • David Wilkie

    Time is prime.., because the connection of information has probability one, infinity/infinity = *, so all identity is some selection of characteristic proportion in reciprocal proportion, which-when the selection process is exponential, leads to an intersection of natural logarithm (including every and all numerical bases in "e", continuous).
    An actual Mathematician could deduce the probability of the natural occurrence of identities in other number bases, but I'm only committed to comments.
    Professor John D Barrow has presented a very impressive lecture on this topic.
    ___

    In a manner of (QM-Time mathematical Intuition) speaking.., if this aspect of here-now is the metastable tip of the topological iceberg, the incident cause-effect of e-Pi rationalisation i-reflection modulation, then the natural probability occurrence of potential possibilities in the Universal Quantum Computation is a continuous expression of the number proportions perceived as Benford's Law…

  • I need no channel youtube!

    What happens if you do the same probability disttibution for 2? You end up with the same graph. And it only takes the same 10 change to get from 1k down to 9k.
    I still dont understand this. Do numbers have a natural tendency to increase more often than decrease?

  • Jonadab the Unsightly One

    This only works if the numbers are chosen in a way that makes lower-magnitude numbers more common than would be expected in an even distribution.

    Powers of two (or of any number) are a very nice demonstration, because they increase in magnitude at a fairly smooth rate as they go.

    If the space of available numbers is finite and all numbers in that space are equally likely, you instead get something more like the sort of leading-digit probabilities people would tend to expect. So for example, in a sampling of genuinely random numbers between 1 and a billion, about 90% of them will (in base 10) be between 100 million and 1 billion; roughly 9% will be between 10 million and 100 million; and so on. With this sort of distribution, all leading digits are just about equally likely. You can make this more obvious by including the leading zeros, at which point it is straightforward that each leading digit, including 0, occurs 10% of the time.

  • Kaczankuku

    If we first start from one, the frequency of appearing one is really as it was showed in the video but we can also start from nine and go to lower and lower numbers, then the frequency will equal to the frequency of one at first condition.

  • Tim Lewis

    So would it be the mean proportion of leading values in lists of natural numbers starting from 1. e.g. the average leading value in 1; 1,2; 1,2,3; ………… 1,2,3,4,5,……,1000 etc. This would give you 1+1/2+1/3+…..+1/9+2/10+3/11+4/12+…….+11/19 and then divide that sum by the number of sets e.g. 19 in this case.
    Or would it also include all possible sets that don't start with one? Maybe you don't even have to increase 1 at a time, it can be random sets of any natural numbers.

  • Paul Shin

    Wait, isn’t it because of some connection between population growth? Like, the time it takes to go from 1 to 2 is greater than the time it takes to go from 9 to 10, and 10-20 vs 90-100, and so on? I think someone here made a connection to rivers as well… it’s easier for a river to go from 900 to 1000 meters long than it is for it to go from 100 to 200.

    I think that if the system at hand is a growing dynamic system, it will show this kind of behavior. But if if we’re dealing with random numbers, this behavior will not hold.

  • Shane Clough

    How I thought about it was, using the raffle analogy, The probability of leading digit being 1 when there is only 1 ticket (tickets starting at 1) is 1, then with 2 it's 1/2, with 3 1/3 etc up till when you have 1, 2, 3, 4, 5, 6, 7, 8 and 9, at which point you have a 1/8 chance. If you add 1/n from n=1 to n=8, then divide by the number of different raffles you get (2.717…/9) = 0.3019

    Not sure if it's just coincidental though.

  • Julian Barber

    youre adding 1 more often than anything, for instance there is 9 2 digit numbers that lead with one, 999 4 digit numbers, 9999 5 digit numbers. etc. makes pwerfect sense

  • Drew Berry

    The number on this video

    (5) 72624 views
    (9) 600 likes
    (1) 36 dislikes
    (1) 238 comments

    The likes will tick over to 1 soon enough so this video is a fitting example.

  • Tom Kerruish

    If you're old enough, picture a slide rule. This is equivalent to a uniform distribution on one. It's especially clear if you picture a circular one; a rotation preserves the probabilities.

  • Ivan Mirisola

    Would this work for forgery detection on electronic voting? After election you would have the amount of candidate votes per region or ballots, etc. Would it be possible to figure out if in the universe of total voters some have been tempered with?

  • lidoz

    If you start with #20 and move upward you will get the same percentage… #1 is 30% only because it's the first number you are starting with and #10-19 follows shortly after

  • Rico Cordova

    This was intuitive for me. I just thought "well, you are counting UP from 0, you're going to hit 1s before you hit 9s." It would be the exact opposite if you were counting DOWN from some arbitrarily large number.

  • Zach Krakower

    Benford’s Law just says that we start counting with the digit 1, and go up from there, to 2 and then 3. Each new power of 10 (our base) begins again with 1.

    To generate a distribution that directly contradicts the “law” start counting backwards from any power of ten. You won’t see many leading 1’s for a while. Or for extra fun just switch out the 1’s for 9’s, etc. every time you count; they’re just symbols after all, placeholders for the “real thing”

    Edit: for extra fun switch out 0’s for the 9’s, 1’s for 8’s, etc… or else things will get all out of order…

Leave a Reply

Your email address will not be published. Required fields are marked *