Compute Rand M from Rand N

How to get a random number in [0, M) by a random number generator that randomly throws a number in [0, N) with equal probability ?

\(Rand(M)\) by \(Rand(N)\)

In this post, \(Rand \; K\) or \(Rand(K)\) denotes a random number generator that produce a integer randomly in \([0, K)\) (or \([0, K - 1]\), from \(0\) to \(K-1\) inclusively) with equal probability.

We can simply divide this problem into two cases: \(M \leq N\) or \(M \gt N\).

\(M \leq N\)

If \(M \leq N\), then the range \([0, M)\) is in the range \([0, N)\). Therefore, the probability of a number choosed by \(Rand(N)\) in \([0, M)\) is same. All of them are \(\frac{1}{N}\).

number probability
\(0\) \(\frac{1}{N}\)
\(1\) \(\frac{1}{N}\)
\(2\) \(\frac{1}{N}\)
\(\vdots\) \(\vdots\)
\(M-1\) \(\frac{1}{N}\)
\(\vdots\) \(\vdots\)
\(N-1\) \(\frac{1}{N}\)

What we want is to develop a method that can randomly choose a number in \([0, M)\) with equal probability. We can leverage this fact.

The simplest way is to re-produce a number if the number produced is out of the range we want (\([0, M)\)). That is, if the number produced is in \([M, N)\), we re-generate a number.

Does it work? That’s check the probability for each number. For each round, the probability we need to re-generate a number is \(\frac{N - M}{N}\). Hence, the probability to get a number \(i\) in the \(K\) round can be organized as the following table, where \(i \in [0, M)\) and \(K \in \mathbb{N}\).

round probability
\(1\) \(\frac{1}{N}\)
\(2\) \(\frac{N - M}{N} \cdot \frac{1}{N}\)
\(3\) \({(\frac{N - M}{N})}^{2} \cdot \frac{1}{N}\)
\(\vdots\) \(\vdots\)
\(K\) \({(\frac{N - M}{N})}^{K-1} \cdot \frac{1}{N}\)

As a result, the probability to get the number \(i\) is

\[\sum_{k=1}^\infty {(\frac{N - M}{N})}^{k-1} \cdot \frac{1}{N} = \frac{1}{N} + \frac{N - M}{N} \cdot \frac{1}{N} + {(\frac{N - M}{N})}^{2} \cdot \frac{1}{N} + \ldots + {(\frac{N - M}{N})}^{K-1} \cdot \frac{1}{N}\]

This is same for all number \(i\), where \(i \in [0, M)\).

number probability
\(0\) \(\sum_{k=1}^\infty {(\frac{N - M}{N})}^{k-1} \cdot \frac{1}{N}\)
\(1\) \(\sum_{k=1}^\infty {(\frac{N - M}{N})}^{k-1} \cdot \frac{1}{N}\)
\(2\) \(\sum_{k=1}^\infty {(\frac{N - M}{N})}^{k-1} \cdot \frac{1}{N}\)
\(\vdots\) \(\vdots\)
\(M-1\) \(\sum_{k=1}^\infty {(\frac{N - M}{N})}^{k-1} \cdot \frac{1}{N}\)

By now, we alreay know how to randomly generate a number in a smaller range from a random number generator in a bigger range.

In brief, the algorithm is

  1. Get a random number \(x\) in \([0, N)\)
  2. If \(x\) is in \([0, M)\), then return \(x\)
  3. Otherwise, repeat from 1

The following Rust program is the method we developed above:

// Get a random number in [0, s) by a random number generator in [0, b),
// where both s, b are integers and s > 1, b > 1, s <= b
fn rand_small_from_big(s: u32, b: u32) -> u32 {
    assert!(s > 1 && b > 1 && s <= b);
    loop {
        // Get a random number in [0, b)
        let x = rand(b);
        // Return the random number if it is in [0, s)
        if x < s {
            return x;
        }
        // Repeating if the random number is in [s, b)
    }
}

See the live demo here.

Actually, we could make some improvement for this approach. By the above approach, it’s very likely to re-produce the random number again and again when \(N\) is much bigger than \(M\). The problem is, the number is only valid when it’s in \([0, M)\). When \(M \ll N\), the chance to get a valid random number is very small. For example, if \(N = 101, M = 2\), the probability to get a valid number is lower than \(2\%(2/101)\). The produced number is only valid when it’s \(0\) or \(1\). It’s inefficient.

A simple solution is to make the number valid in \([0, k \cdot M)\), where the \(k \in \mathbb{N}\). The \(k \cdot M\) is the maximal multiple of \(M\) in \([0, N)\). If the number is in \([0, k \cdot M)\), then we can take the remainder of the produced number divided by \(M\) as the produced random number.

number probability evaluated
\(0\) \(\frac{1}{N}\) \(0\)
\(1\) \(\frac{1}{N}\) \(1\)
\(2\) \(\frac{1}{N}\) \(2\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(M-1\) \(\frac{1}{N}\) \(M-1\)
\(M\) \(\frac{1}{N}\) \(0\)
\(M+1\) \(\frac{1}{N}\) \(1\)
\(M+2\) \(\frac{1}{N}\) \(2\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(2M-1\) \(\frac{1}{N}\) \(M-1\)
\(2M\) \(\frac{1}{N}\) \(0\)
\(2M+1\) \(\frac{1}{N}\) \(1\)
\(2M+2\) \(\frac{1}{N}\) \(2\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(kM-1\) \(\frac{1}{N}\) \(M-1\)
\(kM\) \(\frac{1}{N}\) \(0\)
\(kM+1\) \(\frac{1}{N}\) \(kM+1\)
\(kM+2\) \(\frac{1}{N}\) \(kM+2\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(N-1\) \(\frac{1}{N}\) \(N-1\)
evaluated probability
\(0\) \(\frac{k}{N}\)
\(1\) \(\frac{k}{N}\)
\(2\) \(\frac{k}{N}\)
\(\vdots\) \(\vdots\)
\(M-1\) \(\frac{k}{N}\)
others \(\frac{N - k \cdot M}{N}\)

As a result, the probability to get the number \(i\) is

\[\sum_{j=1}^\infty {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N} = \frac{k}{N} + \frac{N - k \cdot M}{N} \cdot \frac{k}{N} + {(\frac{N - k \cdot M}{N})}^{2} \cdot \frac{k}{N} + \ldots + {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N}\]

This is same for all number \(i\), where \(i \in [0, M)\).

number probability
\(0\) \(\sum_{j=1}^\infty {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N}\)
\(1\) \(\sum_{j=1}^\infty {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N}\)
\(2\) \(\sum_{j=1}^\infty {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N}\)
\(\vdots\) \(\vdots\)
\(M-1\) \(\sum_{j=1}^\infty {(\frac{N - k \cdot M}{N})}^{j-1} \cdot \frac{k}{N}\)

The higher \(k\) is, the higher probability to get a valid number. In the above example, if \(N = 101, M = 2\), then \(k = 50\). The probability to get a valid number is \(100/101\) since the produced number is valid from \(0\) to \(99\). Every even number in \([0, 100)\) (\(0, 2, 4, \dots, 98\)) will be evaluated to \(0\) and every odd number in \([0, 100)\) (\(1, 3, 5, \dots, 99\)) will be evaluated to \(1\), since the remainders of the produced even numbers divided by \(M\) and odd numbers divided by \(M\) are \(0\) and \(1\) respectively. That is, the probability becomes \(k\) times.

To sum up, the algorithm is

  1. Define \(K\) to the max multiple of \(M\) in \(N\)
  2. Get a random number \(x\) in \([0, N)\)
  3. If \(x\) is in \([0, K)\), then return \(x % M\)
  4. Otherwise, repeat from 2
// Get a random number in [0, s) by a random number generator in [0, b),
// where both s, b are integers and s > 1, b > 1, s <= b
fn rand_small_from_big(s: u32, b: u32) -> u32 {
    assert!(s > 1 && b > 1 && s <= b);
    // Get the max multiple of s in [0, b)
    let max = (b / s) * s;
    assert_eq!(max % s, 0);
    loop {
        // Get a random number in [0, b)
        let x = rand(b);
        // Return the random number if it is in [0, max)
        if x < max {
            return x % s;
        }
        // Repeating if the random number is in [max, b)
    }
}

See the live demo here.

\(M \gt N\)

However, if \(M \gt N\), our method above doesn’t work, since the range of \([0, M)\) is bigger than \([0, N)\) (really? keep reading!).

To get a random number in \([0, M)\), we need to enlarge the range of the numbers produced by the given generator. How to do that ?

OK, the number of results from the generator is \(N\) now. How to enlarge the range of results ? Should we generate two numbers and do some magic math ?

How many results we have if we generate two numbers ? If we see the results as permutations, it’s \(N^2\).

When \(N = 2\), the results are \((0, 0), (0, 1), (1, 0), (1, 1)\). The probability of each result is \(1/4\).

1st 2nd probability
0 0 \(\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}\)
0 1 \(\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}\)
1 0 \(\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}\)
1 1 \(\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}\)

If the results can be mapped from \((0, 0), (0, 1), (1, 0), (1, 1)\) to \(0, 1, 2, 3\), it means we have a way to enlarge the range of results from \([0, 1]\) to \([0, 1, 2, 3]\).

Well, is it possible to do that ? If we observe carefully, it’s natural to map \((0, 0), (0, 1), (1, 0), (1, 1)\) to \(0, 1, 2, 3\). If we see \(00, 01, 10, 11\) as binary numbers \({00}_{2}, {01}_{2}, {10}_{2}, {11}_{2}\), then they are naturally \(0, 1, 2, 3\).

The same approach also works when we produce two numbers from random number generator whose range is \([0, 3)\). There are \(3 \cdot 3 = 9\) results: \((0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)\).

1st 2nd probability
0 0 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
0 1 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
0 2 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
1 0 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
1 1 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
1 2 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
2 0 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
2 1 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)
2 2 \(\frac{1}{3} \cdot \frac{1}{3} = \frac{1}{9}\)

If we see them as the base-\(3\) numbers, they are naturally \(0, 1, 2, 3, 4, 5, 6, 7, 8\).

By applying this generating-two-numbers approach for a random number generator with range in \([0, N)\), the sequential results \((i, j)\), where \(i, j \in [0, N)\), can be mapped to \(0, 1, 2, \ldots, N^2 - 1\) (range in \([0, N^2)\)) by treating them as base-\(N\) numbers \({ij}_{N}\).

In general, for the random number generator with range in \([0, N)\), there are \(N^k\) sequential results for producing \(k\) numbers. The sequential results are \((x_0, x_1, \ldots, x_{k-1})\), where \(x_i \in [0, N)\) and \(i \in [0, k)\). By taking the sequential results in base-\(N\) : \({(x_0 x_1 \ldots x_{k-1})}_{N}\), their valus are naturally \(0, 1, 2, \ldots, N^k\). It is how the base-\(N\) number works.

1st 2nd \(\ldots\) kth probability
0 0 \(\ldots\) 0 \(\frac{1}{N^k}\)
0 0 \(\ldots\) 1 \(\frac{1}{N^k}\)
0 0 \(\ldots\) 2 \(\frac{1}{N^k}\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
0 0 \(\ldots\) N-1 \(\frac{1}{N^k}\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
N-1 N-1 \(\ldots\) 0 \(\frac{1}{N^k}\)
N-1 N-1 \(\ldots\) 1 \(\frac{1}{N^k}\)
N-1 N-1 \(\ldots\) 2 \(\frac{1}{N^k}\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
N-1 N-1 \(\ldots\) N-1 \(\frac{1}{N^k}\)

When producing \(k\) numbers by the random number generator with range in \([0, N)\), the sequential results \(x_0, x_1, \ldots, x_{k-1}\) can be mapped to a number in \([0, N^k)\) by encoding them as a base-\(N\) number \({(x_0 x_1 \ldots x_{k-1})}_{N}\).

The idea can be implemented as the following program:

// Map the result of (x_0, x_1, ..., x_{k-1})
// to (x_0 x_1 ... x_{k-1}) in base-N number
let mut x = 0;
for _ in 0..k {
    // Repeat k times
    x = x * N + rand(N);
}

There are \(N^k\) kinds of sequential results for producing \(k\) numbers. The probability for each one is \(\frac{1}{N^k}\). By this approach, we have a way to get one result in the \(N^k\) kinds of sequential results. That is, we have a way to generate a random number in \([0, N^k)\) by the random number generator in \([0, N)\).

The original problem is to find a way to generate a random number in \([0, M)\) by the random number generator in \([0, N)\), where \(M \gt N\). Since we alreay know how to get a random number in a smaller range by a random number generator in a bigger range (the method developped in the \(M \leq N\) case), if we can find a \(k\) such that \(N^k \geq M\), then the problem can be solved!

The \(k\) can be calculated by logarithm:

\[N^k \geq M \Rightarrow k \geq \log_N M\]

The corresponding program is:

// Return minimal k such that N^k >= M
fn min_pow(N: u32, M: u32) -> u32 {
    let M_f64 = f64::from(M);
    let N_f64 = f64::from(N);
    M_f64.log(N_f64).ceil()) as u32
}

To sum up, if \(M \gt N\), then

  1. Find a \(k\) such that \(N^k \geq M\)
  2. Define \(Y\) to the max multiple of \(M\) in \(N^k\)
  3. Get the random number generator in \([0, N^k)\)
  4. Generate a random number in \([0, M)\):
    1. Get a random number \(x\) in \([0, N^k)\)
    2. If \(x\) is in \([0, Y)\), then return \(x % M\)
    3. Otherwise, repeat from 3-1
// Get a random number in [0, b) by a random number generator in [0, s),
// where both s, b are integers and s > 1, b > 1, s <= b
fn rand_big_from_small(b: u32, s: u32) -> u32 {
    assert!(s > 1 && b > 1 && s <= b);
    // Get a minimal number p such that s^p >= b
    let p = min_pow(s, b);
    assert!(s.pow(p) >= b);
    // Get the max multiple of b in [0, s^p)
    let max = (s.pow(p) / b) * b;
    assert_eq!(max % b, 0);
    loop {
        // Get a random number in [0, s^p)
        let r = rand_pow(s, p);
        // Return the random number if it is in [0, max)
        if r < max {
            return r % b;
        }
        // Repeating if the random number is in [max, s^p)
    }
}

// Get a random from [0, x^y)
fn rand_pow(x: u32, y: u32) -> u32 {
    assert!(x > 1 && y > 0);
    let mut r = 0;
    for _ in 0..y {
        // Repeat y times
        r = r * x + rand(x);
    }
    r
}

// Return minimal k such that x^k >= y
fn min_pow(x: u32, y: u32) -> u32 {
    (f64::from(y).log(f64::from(x)).ceil()) as u32
}

See the live demo here.

Enlarge from \([0, 1, \ldots, N-1]\) to \([0, 1, \ldots, N^k-1]\)

An interesting view to look the line \(x * N + rand(N)\) is to think \(x\) is a \(l\) numbers sequence from \(a\) to \(a + l - 1\), \([a, a + 1, a + 2, \ldots, a + l - 1]\) and \(rand(N)\) is a list from \(0\) to \(N-1\), \([0, 1, 2, \ldots, N - 1]\). In this case, \(x * N + rand(N)\) is something like

// x is [a, a + 1, a + 2, ..., a + l - 1]
fn enlarge(x: Vec<u32>, N: u32) -> Vec<u32> {
    let offsets: Vec<u32> = (0..N).collect();
    let mut out = Vec::new();
    for i in x.iter() {
        for j in offsets.iter() {
            out.push(N * i + j);
        }
    }
    out
}

What enlarge does is to enlarge a sequence from \([a, a + 1, a + 2, \ldots, a + l - 1]\) to \([a \cdot N, a \cdot N + 1, a \cdot N + 2, \ldots, (a + l) \cdot N - 1)]\).

By multiplying \(N\) on each value in \([a, a + 1, a + 2, \ldots, a + l - 1]\), the difference between \(i\) and \(i + 1\) is enlarged to \(N\), where \(i \in [0, a + l - 1)\). The sequence becomes \([a \cdot N, (a + 1) \cdot N, (a + 2) \cdot N, \ldots, (a + l - 1) \cdot N]\). By padding \(0\), \(1, 2, \ldots , N-1\) to the gaps between each value in \([a \cdot N, (a + 1) \cdot N, (a + 2) \cdot N, \ldots, (a + l - 1) \cdot N]\), The sequence becomes \([a \cdot N, a \cdot N + 1, a \cdot N + 2, \ldots, (a + l) \cdot N - 1)]\). This is all the values in \([a, (a + l) \cdot N)\). In other words, the sequence containing all the values in \([a, a + l)\) can be enlarged to a sequence containing all the values in \([a, (a + l) \cdot N)\) by applying enlarge.

If \(a\) is \(0\) and \(l\) is \(N^p\), then we can enlarge a \([0, N^p)\)-sequence to a \([0, N^{p+1})\)-sequence.

Based on this idea, the following code

let mut x = 0;
for _ in 0..k {
    // Repeat k times
    x = x * N + rand(N);
}

is similar to call k-1-times enlarge

// x is [0, 1, ..., N-1]
for _ in 0..k-1 {
    // Repeat k-1 times
    x = enlarge(x, N);
}

to enlarge a list from \([0, 1, ..., N-1]\) to \([0, 1, \ldots, N^k-1]\). The first x = x * N + rand(N) is to generate a \([0, 1, \ldots, N-1]\) to x.

See the live demo here.

Compute Rand M from Rand N

In fact, the method developped for \(M \leq N\) case is a special case in the method for \(M \gt N\) case.

Recall the algorithm for \(M \gt N\) case:

  1. Find a \(k\) such that \(N^k \geq M\)
  2. Define \(Y\) to the max multiple of \(M\) in \(N^k\)
  3. Get the random number generator in \([0, N^k)\)
  4. Generate a random number in \([0, M)\):
    1. Get a random number \(x\) in \([0, N^k)\)
    2. If \(x\) is in \([0, Y)\), then return \(x % M\)
    3. Otherwise, repeat from 3-1

If \(M \leq N\), then \(k\) is \(1\)!

Thus, the algorithm can be summarized as:

// Get a random number in [0, m) by a random number generator in [0, n)
fn rand_m_from_n(m: u32, n: u32) -> u32 {
    assert!(m > 1 && n > 1);
    // Get a minimal number p such that n^p >= m
    let p = min_pow(n, m);
    // Get the max multiple of m in [0, n^p)
    let max = (n.pow(p) / m) * m;
    loop {
        // Get a random number in [0, n^p)
        let r = rand_pow(n, p);
        // Return the random number if it is in [0, max)
        if r < max {
            return r % m;
        }
        // Repeating if the random number is in [max, n^p)
    }
}

// Get a random from [0, x^y)
fn rand_pow(x: u32, y: u32) -> u32 {
    assert!(x > 1 && y > 0);
    let mut r = 0;
    for _ in 0..y {
        // Repeat y times
        r = r * x + rand(x);
    }
    r
}

// Return minimal k such that x^k >= y
fn min_pow(x: u32, y: u32) -> u32 {
    (f64::from(y).log(f64::from(x)).ceil()) as u32
}

See the live demo here.

If N is 2

To get the random number in \([0, 2^k)\) where \(k \in \mathbb{N}\), one mentionable trick is to replace

x = x * 2 + rand(2);

by

x = x << 1 | rand(2);

when \(N = 2\).

On the other hand, we don’t need to find the max multiple of \(x\) in \([0, 2^p)\), where \(p\) is the minimal number such that \(2^p \geq x\). It doesn’t exist! No multiple of \(x\) is smaller than \(2^p\). If it exists, it is at least \(2x\), and it implies there is a \(2x \leq 2^p\). However, if \(2x \leq 2^p\), then \(x \leq 2^{p-1}\) rather than \(x \leq 2^p\)!

As a result, the program is:

// Get a random number in [0, x) by a random number generator in [0, 1]
fn rand_from_2(x: u32) -> u32 {
    assert!(x > 1);
    // Get a minimal number p such that 2^p >= x
    let p = min_pow(x);
    loop {
        // Get a random number in [0, 2^p)
        let r = rand_pow(p);
        // Return the random number if it is in [0, x)
        if r < x {
            return r;
        }
        // Repeating if the random number is in [x, 2^p)
    }
}

// Get a random from [0, 2^k)
fn rand_pow(k: u32) -> u32 {
    assert!(k > 0);
    let mut r = 0;
    for _ in 0..k {
        // Repeat k times
        r = r << 1 | rand(2);
    }
    r
}

// Return minimal k such that 2^k >= x
fn min_pow(x: u32) -> u32 {
    (f64::from(x).log(f64::from(2)).ceil()) as u32
}

See the live demo here.

How to check if the distribution is uniform

One way to check if the distrubution of a random number generator is uniform is to apply chi square test. The discussion can be found here. The Testing a Random Number Generator chapter in John D. Cook’s Beautiful Testing is also a great reference to read.