In
previous post,
we learned how to calculate *Fibonacci* numbers by *Fast Doubling* in math.
Today, we will apply it in programming and optimize it step by step.

## Fast Doubling

It’s natural to write a recursive implementation by the above definition. In the following steps, we will implement recursive versions first, then try converting it into iterative versions.

### Recursive (Top-down) Approach

Given a , we could calculate *Fibonacci* numbers by:

```
if (n % 2) { // n is odd: F(n) = F(((n-1) / 2) + 1)^2 + F((n-1) / 2)^2
unsigned int k = (n - 1) / 2;
return fib(k) * fib(k) + fib(k + 1) * fib(k + 1);
} else { // n is even: F(n) = F(n/2) * [ 2 * F(n/2 + 1) - F(n/2) ]
unsigned int k = n / 2;
return fib(k) * [ 2 * fib(k + 1) - fib(k) ];
}
```

From above code, we can know that the code stack will be entered again and again, so we need to define when to stop it.

```
if (n == 0) {
return 0; // F(0) = 0.
} else if (n <= 2) {
return 1; // F(1) = F(2) = 0.
} else {
// Keep calling itself recursively to get the answer.
// Put the main body here.
}
```

We only calculate *Fibonacci* numbers from ,
so we need to stop when .
The `fib(0)`

may be asked from calculating `fib(1) = fib(0)*fib(0) + fib(1)*fib(1)`

(by setting to ,
so we also need to define `fib(1) = 1`

directly,
or it will cause an endless recursion.

Similarly, the `fib(1)`

may be asked from calculating `fib(2) = fib(1) * [2 * fib(2) - fib(1)]`

(by setting to ,
so `fib(2) = 1`

also needs to be returned directly.

As the result, the code can be written into:

```
///////////////////////////////////////////////////////////////////////////////
// Fast doubling: O(log(n))
// Using 2n to the Fibonacci matrix above, we can derive that:
// F(2n) = F(n) * [ 2 * F(n+1) – F(n) ]
// F(2n+1) = F(n+1)^2 + F(n)^2
// (and F(2n-1) = F(n)^2 + F(n-1)^2)
uint64_t fib(unsigned int n)
{
if (n == 0) {
return 0; // F(0) = 0.
} else if (n <= 2) {
return 1; // F(1) = F(2) = 0.
}
unsigned int k = 0;
if (n % 2) { // By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2
k = (n - 1) / 2;
return fib(k) * fib(k) + fib(k + 1) * fib(k + 1);
} else { // By F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
k = n / 2;
return fib(k) * (2 * fib(k + 1) - fib(k));
}
}
```

Now, let we look where we could improve from this simple version.
We use duplicated `fib(k)`

and `fib(k + 1)`

to calculate `fib(n)`

.
That is, we will have two duplicated recursive processes to do the same work.
It would be a waste of the time.

Another trick is that we could use `n = n / 2`

in both cases
( is odd or even) since the result of `n = (n - 1) / 2`

is same
as `n = n / 2`

in *C/C++*’s world if `n`

is an **odd** integer.

Thus, we can rewrite the code into:

```
uint64_t fib(unsigned int n)
{
// When n = 2: k = 1 and we want to use F(k+1) to calculate F(2k),
// However, F(2k) = F(k+1) = F(2) is unknown then.
if (n <= 2) {
return n ? 1 : 0; // F(0) = 0, F(1) = F(2) = 1.
}
unsigned int k = n / 2; // k = n/2 if n is even. k = (n-1)/2 if n is odd.
uint64_t a = fib1(k);
uint64_t b = fib1(k + 1);
if (n % 2) { // By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2
return a * a + b * b;
}
// By F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
return a * (2 * b - a);
}
```

#### Memoization

Do we save all duplicated task now? No. Suppose we need to find , then we need to get ….

It’s clear that we have a duplicated on above figure. can return value directly, while can not. Therefore, the sub-tree(sub-process) whose root is will be executed twice.

The larger the is, the more duplicated sub-process will be executed.
To avoid the waste, we can add an *array* to save all the calculated value.
We check the *array* first when is calculated.
If there is already a saved value in the *array* at ,
then we can use it directly.
Otherwise, it will be calculated as usual.
It’s called *memoization*.
We will save as the element in the *array*.

In this case, the is not calculated successively.
For example, to get , we only need .
We don’t need , so there is no value at the element in the *array*.
You can use *hash map* instead of *array* to avoid the waste of memory.
However, retrieving data from *array* is faster than *hash map*,
so we apply *array* in our sample code:

```
const unsigned int SIZE = 1000;
// 4 is not a fibonacci number, so using it as initialized value.
const uint64_t INIT = 4;
// In this case, F is not calculated successively. For example,
// To get F(6), we only need F(4), F(3), F(2), F(1), F(0) (no F(5)),
// so the other elements in F is still INIT.
// Another way is to use hash map(std::unordered_map), however,
// it will be slower.
uint64_t MEM[SIZE] = { [0 ... SIZE-1] = INIT };
uint64_t fib(unsigned int n)
{
if (MEM[n] != INIT) {
return MEM[n];
}
if (n == 0) {
return (MEM[n] = 0); // F(0) = 0.
} else if (n <= 2) {
return (MEM[n] = 1); // F(1) = F(2) = 1.
}
unsigned int k = n / 2; // k = n/2 if n is even. k = (n-1)/2 if n is odd.
uint64_t a = fib(k);
uint64_t b = fib(k + 1);
// By F(n) = F(2k+1) = F(k+1)^2 + F(k)^2, if n is odd.
// F(n) = F(2k) = F(k) * [ 2 * F(k+1) – F(k) ], if n is even.
return (MEM[n] = (n % 2) ? a * a + b * b : a * (2 * b - a));
}
```

#### State vector

Although we can speed up the calculating by applying *memoization* above,
the memory consumption with this approach grows with .
Is it possible to use a fixed memory no matter how big is?
The answer is yes. Actually, we could just use a two-elements array to do it.

From the formula, we can calculate from . For example, to calculate , we need . To calculate , we need . To calculate , we need . To calculate , we need (so we need to stop here since is the dead end).

However, how do we get when we only have to calculate ? Or how to get when we only have to calculate …?

By applying to formula, we can use to get . Then we can get .

Thus, we are able to get by the following procedure:

Thus, we could keep using two-elements array for to compute what we want and update it step by step.

But how to determine the state we should update from , or ?

It’s simple. If is even, we need to find , where since . Then we can use to calculate .

Otherwise, if is odd, we need to find , where since . Then we can use to calculate and then get by .

In summary, the procedure can be organized as follows:

10 | 5 | 2 | 1 | 0 | |

is odd | v | v | |||

The last two rows, , are the state vector that contains our answer.

The first row , is the index of the first element of the state vector . The second row indicates that whether is odd or not. If is odd(recall what we discuss above), then we need to update state from from to since . The third row is used to record if we need get from . Otherwise, if is even, updating state from to directly.

From the top-down perspective, to get , we need . To get , we need . To get , we need . To get , we need . We will demonstrate how we do it recursively below.

From the bottom-up perspective, we can use to get , then to get , to get , to get . We will demonstrate how we do it in iterative section.

The recursive approach is easier to understand. By what we summarized above, the simplest implementation will be:

```
// Set f[0], f[1] to F(n), F(n+1).
void fib_helper(unsigned int n, uint64_t f[]);
// 4 is not a fibonacci number, so using it as initialized value.
const uint64_t INIT = 4;
uint64_t fib(unsigned int n)
{
uint64_t f[2] = { INIT, INIT };
fib_helper(n, f);
return f[0];
}
void fib_helper(unsigned int n, uint64_t f[])
{
if (n == 0) {
f[0] = 0; f[1] = 1;
return;
}
unsigned int k = 0;
if (n % 2) {
k = (n - 1) / 2;
fib_helper(k, f);
uint64_t a = f[0]; // F(k) = F((n-1)/2)
uint64_t b = f[1]; // F(k + 1) = F((n- )/2 + 1)
uint64_t c = a * (2 * b - a); // F(n-1) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
uint64_t d = a * a + b * b; // F(n) = F(2k + 1) = F(k)^2 + F(k+1)^2
f[0] = d; // F(n)
f[1] = c + d; // F(n+1) = F(n-1) + F(n)
} else {
k = n / 2;
fib_helper(k, f);
uint64_t a = f[0]; // F(k) = F(n/2)
uint64_t b = f[1]; // F(k + 1) = F(n/2 + 1)
f[0] = a * (2 * b - a); // F(n) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
f[1] = a * a + b * b; // F(n + 1) = F(2k + 1) = F(k)^2 + F(k+1)^2
}
}
```

The above `fib_helper`

is quite tedious,
we can be simplify it into:

```
// Set f[0], f[1] to F(n), F(n+1).
void fib_helper(unsigned int n, uint64_t f[])
{
if (!n) {
f[0] = 0;
f[1] = 1;
return;
}
fib_helper(n / 2, f);
// k = floor(n/2), so k = n / 2 if n is even, k = (n - 1) / 2 if n is odd.
uint64_t a = f[0]; // F(k)
uint64_t b = f[1]; // F(k+1)
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k+1)^2 + F(k)^2
if (n % 2) { // k = (n - 1) / 2, so F(2k) = F(n-1), F(2k+1) = F(n).
f[0] = d; // F(n) = F(2k+1).
f[1] = c + d; // F(n+1) = F(n-1) + F(n) = F(2k) + F(2k+1).
} else { // k = n / 2, so F(2k) = F(n), F(2k+1) = F(n+1).
f[0] = c; // F(n) = F(2k).
f[1] = d; // F(n+1) = F(2k).
}
}
```

You could also replace *array* with *std::vector*,
so the code will looks more elegant.
However, it will be slower than using *array* directly.

```
// Return vector [ F(n), F(n+1) ].
std::vector<uint64_t> fib_helper(unsigned int n);
uint64_t fib(unsigned int n)
{
return fib_helper(n)[0];
}
std::vector<uint64_t> fib_helper(unsigned int n)
{
if (!n) {
// [F(0), F(1)] = [0 , 1]
return { 0 , 1 };
}
std::vector<uint64_t> f(fib_helper(n / 2));
// k = floor(n/2), so k = n / 2 if n is even, k = (n - 1) / 2 if n is odd.
uint64_t a = f[0]; // F(k)
uint64_t b = f[1]; // F(k+1)
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k+1)^2 + F(k)^2
if (n % 2) { // k = (n - 1) / 2, so F(2k) = F(n-1), F(2k+1) = F(n).
// [F(n), F(n+1)] = [F(2k+1), F(2k+2)] = [F(2k+1), F(2k) + F(2k+1)]
return { d, c + d };
} else { // k = n / 2, so F(2k) = F(n), F(2k+1) = F(n+1).
// [F(n), F(n+1)] = [F(2k), F(2k+1)].
return { c, d };
}
}
```

### Iterative (Bottom-up) Approach

The recursive approach is implemented from the top-down perspective. We could also do it in bottom-up way.

To convert the recursive steps into an iterative loop,
we need to find the *initialized state* and the *stop condition*.
In the recursive approach, no matter what is, the final state vector
(when the recursive steps stops) is always ,
, and it must be called from calculating the state [F_1, F_2].
Recall how we calculate :

- We recursively calculate from ,
- then ,
- then ,
- then ,
- then stop recursive steps when .

- Next, we get the state vector for ,
- then return on the same track with opposite direction to calculate the state vector for ,
- then for
- then for ,
- and finally get the answer for .

The recursive steps are used to get the track from to , then calculate for each .

To remove the recursive steps, we need to have a way to compute the track.
We can use a *stack* to track the change for , starting push
from , then , , , ,
then the track can be get from popping them from to .

Thus, the *initialized state* is
and the *stop condition* is to check whether the stack is empty.

(Using *stack* is one common approach to
convert recursive code into the iterative one.)

```
uint64_t fib(unsigned int n)
{
// To compute the track from n, n/2, ..., 1, 0.
std::stack<unsigned int> s;
while(n) {
s.push(n);
n /= 2; // n = floor(n/2)
}
s.push(n); // n = 0 now.
uint64_t a; // F(n)
uint64_t b; // F(n+1)
while (!s.empty()) {
unsigned int m = s.top(); s.pop();
if (m == 0) { // Initializing a, b.
a = 0; // F(0) = 0
b = 1; // F(1) = 1
continue;
}
// Let k = floor(m/2), so `a` is F(k) and `b` is F(k+1) now.
// k = m/2, if m is even. k = (m-1)/2, if m is odd.
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k)^2 + F(k+1)^2
if (m % 2) { // m = 2k+1:
a = d; // F(m) = F(2k+1)
b = c + d; // F(m+1) = F(m) + F(m-1) = F(2k+1) + F(2k)
} else { // m = 2k:
a = c; // F(m) = F(2k)
b = d; // F(m+1) = F(2k+1)
}
}
return a;
}
```

The above code is a bit ugly for simulating the recursive steps like:

The *initialized state* is usually set outside of the loop directly like below:

```
...
uint64_t a = 0; // F(0) = 0
uint64_t b = 1; // F(1) = 1
while (!s.empty()) {
...
}
...
```

Since *initialized state* is set before the loop,
we should start the track from to :

```
std::stack<unsigned int> s;
while (n) {
s.push(n);
n /= 2;
}
// No `s.push(n); // n = 0 now.` here!
```

Therefore, the code will be:

```
uint64_t fib(unsigned int n)
{
std::stack<unsigned int> s;
while (n) {
s.push(n);
/*n /= 2*/n >>= 1;
}
uint64_t a = 0; // F(0) = 0
uint64_t b = 1; // F(1) = 1
while (!s.empty()) {
unsigned int m = s.top(); s.pop();
// Let k = floor(m/2), so `a` is F(k) and `b` is F(k+1) now.
// k = m/2, if m is even. k = (m-1)/2, if m is odd.
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k)^2 + F(k+1)^2
if (/*m % 2*/m & 1) { // m = 2k+1:
a = d; // F(m) = F(2k+1)
b = c + d; // F(m+1) = F(m) + F(m-1) = F(2k+1) + F(2k)
} else { // m = 2k:
a = c; // F(m) = F(2k)
b = d; // F(m+1) = F(2k+1)
}
}
return a;
}
```

Another trick above is to replace `n /= 2`

by `n >>= 1`

and `m % 2`

by `m & 1`

.
It will be faster a little bit.

#### Non-stack approach

Since applying `std::stack`

will pay for memory allocation,
so we should try not using it for better performance.

The reason we need the *stack* is to get the **track for each **,
where until .
And the track is used to determine what state we should update
from , to or ,
by the given is **even or odd**.

In the above implementation,
we put the ,
where denotes
is right shifted by bits(`n_j = n >> j`

)
and is an integer,
to the *stack*, and then iteratively check
is odd or even.
We could do it without *stack*!
Assume the **highest** 1-bit in is the th bit from right side,
then the loop will execute times.
(so the time complexity is )
Therefore, we could loop times to calculate
from to .
As the result, the code will be:

```
uint64_t fib(unsigned int n)
{
// The position of the highest bit of n.
// So we need to loop `h` times to get the answer.
// Example: n = (Dec)50 = (Bin)00110010, then h = 6.
// ^ 6th bit from right side
unsigned int h = 0;
for (unsigned int i = n ; i ; ++h, i >>= 1);
uint64_t a = 0; // F(0) = 0
uint64_t b = 1; // F(1) = 1
for (int j = h - 1 ; j >= 0 ; --j) {
// n_j = floor(n / 2^j) = n >> j, k = floor(n_j / 2), (n_j = n when j = 0)
// then a = F(k), b = F(k+1) now.
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k)^2 + F(k+1)^2
if ((n >> j) & 1) { // n_j is odd: k = (n_j-1)/2 => n_j = 2k + 1
a = d; // F(n_j) = F(2k+1)
b = c + d; // F(n_j + 1) = F(2k + 2) = F(2k) + F(2k+1)
} else { // n_j is even: k = n_j/2 => n_j = 2k
a = c; // F(n_j) = F(2k)
b = d; // F(n_j + 1) = F(2k + 1)
}
}
return a;
}
```

##### By Bit-mask

Doing *AND* operation(`&`

) to the last bit of above is same as
doing *AND* operation(`&`

) **from the highest bit to the lowest bit**
of the . Thus, we could also rewrite the code into:

```
uint64_t fib(unsigned int n)
{
// The position of the highest bit of n.
// So we need to loop `h` times to get the answer.
// Example: n = (Dec)50 = (Bin)00110010, then h = 6.
// ^ 6th bit from right side
unsigned int h = 0;
for (unsigned int i = n ; i ; ++h, i >>= 1);
uint64_t a = 0; // F(0) = 0
uint64_t b = 1; // F(1) = 1
// There is only one `1` in the bits of `mask`. The `1`'s position is same as
// the highest bit of n(mask = 2^(h-1) at first), and it will be shifted right
// iteratively to do `AND` operation with `n` to check `n_j` is odd or even,
// where n_j is defined below.
for (unsigned int mask = 1 << (h - 1) ; mask ; mask >>= 1) { // Run h times!
// Let j = h-i (looping from i = 1 to i = h), n_j = floor(n / 2^j) = n >> j
// (n_j = n when j = 0), k = floor(n_j / 2), then a = F(k), b = F(k+1) now.
uint64_t c = a * (2 * b - a); // F(2k) = F(k) * [ 2 * F(k+1) – F(k) ]
uint64_t d = a * a + b * b; // F(2k+1) = F(k)^2 + F(k+1)^2
if (mask & n) { // n_j is odd: k = (n_j-1)/2 => n_j = 2k + 1
a = d; // F(n_j) = F(2k + 1)
b = c + d; // F(n_j + 1) = F(2k + 2) = F(2k) + F(2k + 1)
} else { // n_j is even: k = n_j/2 => n_j = 2k
a = c; // F(n_j) = F(2k)
b = d; // F(n_j + 1) = F(2k + 1)
}
}
return a;
}
```

All the above code are on gist here.