Calculating Fibonacci Numbers by Fast Doubling

In previous post, we learned how to calculate Fibonacci numbers by Fast Doubling in math. Today, we will apply it in programming and optimize it step by step.

Fast Doubling

It’s natural to write a recursive implementation by the above definition. In the following steps, we will implement recursive versions first, then try converting it into iterative versions.

Recursive (Top-down) Approach

Given a , we could calculate Fibonacci numbers by:

if (n % 2) { // n is odd: F(n) = F(((n-1) / 2) + 1)^2 + F((n-1) / 2)^2
  unsigned int k = (n - 1) / 2;
  return fib(k) * fib(k) + fib(k + 1) * fib(k + 1);
} else { // n is even: F(n) = F(n/2) * [ 2 * F(n/2 + 1) - F(n/2) ]
  unsigned int k = n / 2;
  return fib(k) * [ 2 * fib(k + 1) - fib(k) ];
}

From above code, we can know that the code stack will be entered again and again, so we need to define when to stop it.

if (n == 0) {
  return 0; // F(0) = 0.
} else if (n <= 2) {
  return 1; // F(1) = F(2) = 0.
} else {
  // Keep calling itself recursively to get the answer.
  // Put the main body here.
}

We only calculate Fibonacci numbers from , so we need to stop when . The fib(0) may be asked from calculating fib(1) = fib(0)*fib(0) + fib(1)*fib(1) (by setting to , so we also need to define fib(1) = 1 directly, or it will cause an endless recursion.

Similarly, the fib(1) may be asked from calculating fib(2) = fib(1) * [2 * fib(2) - fib(1)] (by setting to , so fib(2) = 1 also needs to be returned directly.

As the result, the code can be written into:

///////////////////////////////////////////////////////////////////////////////
// Fast doubling: O(log(n))
//   Using 2n to the Fibonacci matrix above, we can derive that:
//     F(2n)   = F(n) * [ 2 * F(n+1) – F(n) ]
//     F(2n+1) = F(n+1)^2 + F(n)^2
//     (and F(2n-1) = F(n)^2 + F(n-1)^2)
uint64_t fib(unsigned int n)
{
  if (n == 0) {
    return 0; // F(0) = 0.
  } else if (n <= 2) {
    return 1; // F(1) = F(2) = 0.
  }

  unsigned int k = 0;
  if (n % 2) { // By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2
    k = (n - 1) / 2;
    return fib(k) * fib(k) + fib(k + 1) * fib(k + 1);
  } else { // By F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
    k = n / 2;
    return fib(k) * (2 * fib(k + 1) - fib(k));
  }
}

Now, let we look where we could improve from this simple version. We use duplicated fib(k) and fib(k + 1) to calculate fib(n). That is, we will have two duplicated recursive processes to do the same work. It would be a waste of the time.

Another trick is that we could use n = n / 2 in both cases ( is odd or even) since the result of n = (n - 1) / 2 is same as n = n / 2 in C/C++’s world if n is an odd integer.

Thus, we can rewrite the code into:

uint64_t fib(unsigned int n)
{
  // When n = 2: k = 1 and we want to use F(k+1) to calculate F(2k),
  // However, F(2k) = F(k+1) = F(2) is unknown then.
  if (n <= 2) {
    return n ? 1 : 0; // F(0) = 0, F(1) = F(2) = 1.
  }

  unsigned int k = n / 2; // k = n/2 if n is even. k = (n-1)/2 if n is odd.
  uint64_t a = fib1(k);
  uint64_t b = fib1(k + 1);

  if (n % 2) { // By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2
    return a * a + b * b;
  }
  // By F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
  return a * (2 * b - a);
}

Memoization

Do we save all duplicated task now? No. Suppose we need to find , then we need to get ….

It’s clear that we have a duplicated on above figure. can return value directly, while can not. Therefore, the sub-tree(sub-process) whose root is will be executed twice.

The larger the is, the more duplicated sub-process will be executed. To avoid the waste, we can add an array to save all the calculated value. We check the array first when is calculated. If there is already a saved value in the array at , then we can use it directly. Otherwise, it will be calculated as usual. It’s called memoization. We will save as the element in the array.

In this case, the is not calculated successively. For example, to get , we only need . We don’t need , so there is no value at the element in the array. You can use hash map instead of array to avoid the waste of memory. However, retrieving data from array is faster than hash map, so we apply array in our sample code:

const unsigned int SIZE = 1000;
// 4 is not a fibonacci number, so using it as initialized value.
const uint64_t INIT = 4;
// In this case, F is not calculated successively. For example,
// To get F(6), we only need F(4), F(3), F(2), F(1), F(0) (no F(5)),
// so the other elements in F is still INIT.
// Another way is to use hash map(std::unordered_map), however,
// it will be slower.
uint64_t MEM[SIZE] = { [0 ... SIZE-1] = INIT };
uint64_t fib(unsigned int n)
{
  if (MEM[n] != INIT) {
    return MEM[n];
  }

  if (n == 0) {
    return (MEM[n] = 0); // F(0) = 0.
  } else if (n <= 2) {
    return (MEM[n] = 1); // F(1) = F(2) = 1.
  }

  unsigned int k = n / 2; // k = n/2 if n is even. k = (n-1)/2 if n is odd.
  uint64_t a = fib(k);
  uint64_t b = fib(k + 1);

  // By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2, if n is odd.
  //    F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ], if n is even.
  return (MEM[n] = (n % 2) ? a * a + b * b : a * (2 * b - a));
}

State vector

Although we can speed up the calculating by applying memoization above, the memory consumption with this approach grows with . Is it possible to use a fixed memory no matter how big is? The answer is yes. Actually, we could just use a two-elements array to do it.

From the formula, we can calculate from . For example, to calculate , we need . To calculate , we need . To calculate , we need . To calculate , we need (so we need to stop here since is the dead end).

However, how do we get when we only have to calculate ? Or how to get when we only have to calculate …?

By applying to formula, we can use to get . Then we can get .

Thus, we are able to get by the following procedure:

Thus, we could keep using two-elements array for to compute what we want and update it step by step.

But how to determine the state we should update from , or ?

It’s simple. If is even, we need to find , where since . Then we can use to calculate .

Otherwise, if is odd, we need to find , where since . Then we can use to calculate and then get by .

In summary, the procedure can be organized as follows:

10 5 2 1 0
is odd   v   v  
     
 

The last two rows, , are the state vector that contains our answer.

The first row , is the index of the first element of the state vector . The second row indicates that whether is odd or not. If is odd(recall what we discuss above), then we need to update state from from to since . The third row is used to record if we need get from . Otherwise, if is even, updating state from to directly.

From the top-down perspective, to get , we need . To get , we need . To get , we need . To get , we need . We will demonstrate how we do it recursively below.

From the bottom-up perspective, we can use to get , then to get , to get , to get . We will demonstrate how we do it in iterative section.

The recursive approach is easier to understand. By what we summarized above, the simplest implementation will be:

// Set f[0], f[1] to F(n), F(n+1).
void fib_helper(unsigned int n, uint64_t f[]);

// 4 is not a fibonacci number, so using it as initialized value.
const uint64_t INIT = 4;

uint64_t fib(unsigned int n)
{
  uint64_t f[2] = { INIT, INIT };
  fib_helper(n, f);
  return f[0];
}

void fib_helper(unsigned int n, uint64_t f[])
{
  if (n == 0) {
    f[0] = 0; f[1] = 1;
    return;
  }

  unsigned int k = 0;
  if (n % 2) {
    k = (n - 1) / 2;
    fib_helper(k, f);
    uint64_t a = f[0];            // F(k) = F((n-1)/2)
    uint64_t b = f[1];            // F(k + 1) = F((n- )/2 + 1)
    uint64_t c = a * (2 * b - a); // F(n-1) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
    uint64_t d = a * a + b * b;   // F(n) = F(2k + 1) = F(k)^2 + F(k+1)^2
    f[0] = d;                     // F(n)
    f[1] = c + d;                 // F(n+1) = F(n-1) + F(n)
  } else {
    k = n / 2;
    fib_helper(k, f);
    uint64_t a = f[0];            // F(k) = F(n/2)
    uint64_t b = f[1];            // F(k + 1) = F(n/2 + 1)
    f[0] = a * (2 * b - a);       // F(n) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
    f[1] = a * a + b * b;         // F(n + 1) = F(2k + 1) = F(k)^2 + F(k+1)^2
  }
}

The above fib_helper is quite tedious, we can be simplify it into:

// Set f[0], f[1] to F(n), F(n+1).
void fib_helper(unsigned int n, uint64_t f[])
{
  if (!n) {
    f[0] = 0;
    f[1] = 1;
    return;
  }

  fib_helper(n / 2, f);
  // k = floor(n/2), so k = n / 2 if n is even, k = (n - 1) / 2 if n is odd.
  uint64_t a = f[0]; // F(k)
  uint64_t b = f[1]; // F(k+1)

  uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
  uint64_t d = a * a + b * b;   // F(2k+1) = F(k+1)^2 + F(k)^2

  if (n % 2) {    // k = (n - 1) / 2, so F(2k) = F(n-1), F(2k+1) = F(n).
    f[0] = d;     // F(n) = F(2k+1).
    f[1] = c + d; // F(n+1) = F(n-1) + F(n) = F(2k) + F(2k+1).
  } else {        // k = n / 2, so F(2k) = F(n), F(2k+1) = F(n+1).
    f[0] = c;     // F(n) = F(2k).
    f[1] = d;     // F(n+1) = F(2k).
  }
}

You could also replace array with std::vector, so the code will looks more elegant. However, it will be slower than using array directly.

// Return vector [ F(n), F(n+1) ].
std::vector<uint64_t> fib_helper(unsigned int n);

uint64_t fib(unsigned int n)
{
  return fib_helper(n)[0];
}

std::vector<uint64_t> fib_helper(unsigned int n)
{
  if (!n) {
    // [F(0), F(1)] = [0 , 1]
    return { 0 , 1 };
  }

  std::vector<uint64_t> f(fib_helper(n / 2));
  // k = floor(n/2), so k = n / 2 if n is even, k = (n - 1) / 2 if n is odd.
  uint64_t a = f[0]; // F(k)
  uint64_t b = f[1]; // F(k+1)

  uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
  uint64_t d = a * a + b * b;   // F(2k+1) = F(k+1)^2 + F(k)^2

  if (n % 2) { // k = (n - 1) / 2, so F(2k) = F(n-1), F(2k+1) = F(n).
    // [F(n), F(n+1)] = [F(2k+1), F(2k+2)] = [F(2k+1), F(2k) + F(2k+1)]
    return { d, c + d };
  } else { // k = n / 2, so F(2k) = F(n), F(2k+1) = F(n+1).
    // [F(n), F(n+1)] = [F(2k), F(2k+1)].
    return { c, d };
  }
}

Iterative (Bottom-up) Approach

The recursive approach is implemented from the top-down perspective. We could also do it in bottom-up way.

To convert the recursive steps into an iterative loop, we need to find the initialized state and the stop condition. In the recursive approach, no matter what is, the final state vector (when the recursive steps stops) is always , , and it must be called from calculating the state [F_1, F_2]. Recall how we calculate :

  • We recursively calculate from ,
    • then ,
    • then ,
    • then ,
    • then stop recursive steps when .
  • Next, we get the state vector for ,
    • then return on the same track with opposite direction to calculate the state vector for ,
    • then for
    • then for ,
    • and finally get the answer for .

The recursive steps are used to get the track from to , then calculate for each .

To remove the recursive steps, we need to have a way to compute the track. We can use a stack to track the change for , starting push from , then , , , , then the track can be get from popping them from to .

Thus, the initialized state is and the stop condition is to check whether the stack is empty.

(Using stack is one common approach to convert recursive code into the iterative one.)

uint64_t fib(unsigned int n)
{
  // To compute the track from n, n/2, ..., 1, 0.
  std::stack<unsigned int> s;
  while(n) {
    s.push(n);
    n /= 2; // n = floor(n/2)
  }
  s.push(n); // n = 0 now.

  uint64_t a; // F(n)
  uint64_t b; // F(n+1)
  while (!s.empty()) {
    unsigned int m = s.top(); s.pop();

    if (m == 0) { // Initializing a, b.
      a = 0; // F(0) = 0
      b = 1; // F(1) = 1
      continue;
    }

    // Let k = floor(m/2), so `a` is F(k) and `b` is F(k+1) now.
    // k = m/2, if m is even. k = (m-1)/2, if m is odd.
    uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
    uint64_t d = a * a + b * b;   // F(2k+1) = F(k)^2 + F(k+1)^2

    if (m % 2) {  // m = 2k+1:
      a = d;      //  F(m) = F(2k+1)
      b = c + d;  //  F(m+1) = F(m) + F(m-1) = F(2k+1) + F(2k)
    } else {      // m = 2k:
      a = c;      //  F(m) = F(2k)
      b = d;      //  F(m+1) = F(2k+1)
    }
  }

  return a;
}

The above code is a bit ugly for simulating the recursive steps like:

The initialized state is usually set outside of the loop directly like below:

...
uint64_t a = 0; // F(0) = 0
uint64_t b = 1; // F(1) = 1
while (!s.empty()) {
  ...
}
...

Since initialized state is set before the loop, we should start the track from to :

std::stack<unsigned int> s;
while (n) {
  s.push(n);
  n /= 2;
}
// No `s.push(n); // n = 0 now.` here!

Therefore, the code will be:

uint64_t fib(unsigned int n)
{
  std::stack<unsigned int> s;
  while (n) {
    s.push(n);
    /*n /= 2*/n >>= 1;
  }

  uint64_t a = 0; // F(0) = 0
  uint64_t b = 1; // F(1) = 1
  while (!s.empty()) {
    unsigned int m = s.top(); s.pop();

    // Let k = floor(m/2), so `a` is F(k) and `b` is F(k+1) now.
    // k = m/2, if m is even. k = (m-1)/2, if m is odd.
    uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
    uint64_t d = a * a + b * b;   // F(2k+1) = F(k)^2 + F(k+1)^2

    if (/*m % 2*/m & 1) { // m = 2k+1:
      a = d;              //  F(m) = F(2k+1)
      b = c + d;          //  F(m+1) = F(m) + F(m-1) = F(2k+1) + F(2k)
    } else {              // m = 2k:
      a = c;              //  F(m) = F(2k)
      b = d;              //  F(m+1) = F(2k+1)
    }
  }

  return a;
}

Another trick above is to replace n /= 2 by n >>= 1 and m % 2 by m & 1. It will be faster a little bit.

Non-stack approach

Since applying std::stack will pay for memory allocation, so we should try not using it for better performance.

The reason we need the stack is to get the track for each , where until . And the track is used to determine what state we should update from , to or , by the given is even or odd.

In the above implementation, we put the , where denotes is right shifted by bits(n_j = n >> j) and is an integer, to the stack, and then iteratively check is odd or even. We could do it without stack! Assume the highest 1-bit in is the th bit from right side, then the loop will execute times. (so the time complexity is ) Therefore, we could loop times to calculate from to . As the result, the code will be:

uint64_t fib(unsigned int n)
{
  // The position of the highest bit of n.
  // So we need to loop `h` times to get the answer.
  // Example: n = (Dec)50 = (Bin)00110010, then h = 6.
  //                               ^ 6th bit from right side
  unsigned int h = 0;
  for (unsigned int i = n ; i ; ++h, i >>= 1);

  uint64_t a = 0; // F(0) = 0
  uint64_t b = 1; // F(1) = 1
  for (int j = h - 1 ; j >= 0 ; --j) {
    // n_j = floor(n / 2^j) = n >> j, k = floor(n_j / 2), (n_j = n when j = 0)
    // then a = F(k), b = F(k+1) now.
    uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
    uint64_t d = a * a + b * b;   // F(2k+1) = F(k)^2 + F(k+1)^2

    if ((n >> j) & 1) { // n_j is odd: k = (n_j-1)/2 => n_j = 2k + 1
      a = d;            //   F(n_j) = F(2k+1)
      b = c + d;        //   F(n_j + 1) = F(2k + 2) = F(2k) + F(2k+1)
    } else {            // n_j is even: k = n_j/2 => n_j = 2k
      a = c;            //   F(n_j) = F(2k)
      b = d;            //   F(n_j + 1) = F(2k + 1)
    }
  }

  return a;
}
By Bit-mask

Doing AND operation(&) to the last bit of above is same as doing AND operation(&) from the highest bit to the lowest bit of the . Thus, we could also rewrite the code into:

uint64_t fib(unsigned int n)
{
  // The position of the highest bit of n.
  // So we need to loop `h` times to get the answer.
  // Example: n = (Dec)50 = (Bin)00110010, then h = 6.
  //                               ^ 6th bit from right side
  unsigned int h = 0;
  for (unsigned int i = n ; i ; ++h, i >>= 1);

  uint64_t a = 0; // F(0) = 0
  uint64_t b = 1; // F(1) = 1
  // There is only one `1` in the bits of `mask`. The `1`'s position is same as
  // the highest bit of n(mask = 2^(h-1) at first), and it will be shifted right
  // iteratively to do `AND` operation with `n` to check `n_j` is odd or even,
  // where n_j is defined below.
  for (unsigned int mask = 1 << (h - 1) ; mask ; mask >>= 1) { // Run h times!
    // Let j = h-i (looping from i = 1 to i = h), n_j = floor(n / 2^j) = n >> j
    // (n_j = n when j = 0), k = floor(n_j / 2), then a = F(k), b = F(k+1) now.
    uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
    uint64_t d = a * a + b * b;   // F(2k+1) = F(k)^2 + F(k+1)^2

    if (mask & n) { // n_j is odd: k = (n_j-1)/2 => n_j = 2k + 1
      a = d;        //   F(n_j) = F(2k + 1)
      b = c + d;    //   F(n_j + 1) = F(2k + 2) = F(2k) + F(2k + 1)
    } else {        // n_j is even: k = n_j/2 => n_j = 2k
      a = c;        //   F(n_j) = F(2k)
      b = d;        //   F(n_j + 1) = F(2k + 1)
    }
  }

  return a;
}

All the above code are on gist here.