统计二进制中 1 的个数 | 边际效应

这是一道《编程之美－微软技术面试心得》中的题目，问题描述如下：

对于一个字节(8bit)的变量,求其二进制表示中“1”的个数,要求算法的执行效率尽可能地高。

《编程之美》中给出了五种解法，但是实际上从 Wikipedia 上我们可以找到更优的算法。

这道题的本质相当于求二进制数的 Hamming 权重，或者说是该二进制数与 0 的 Hamming 距离，这两个概念在信息论和编码理论中是相当有名的。在二进制的情况下，它们也经常被叫做 population count 或者 popcount 问题，比如 gcc 中就提供了一个内建函数：

int __builtin_popcount (unsigned int x)

输出整型数二进制中 1 的个数。但是 GCC 的 __builtin_popcount 的实现主要是基于查表法做的，跟编程之美中解法 5 是一样的。Wikipedia 上的解法是基于分治法来做的，构造非常巧妙，通过有限次简单地算术运算就能求得结果，特别适合那些受存储空间限制的算法中使用：

/* ===========================================================================
* Problem:
*   The fastest way to count how many 1s in a 32-bits integer.
*
* Algorithm:
*   The problem equals to calculate the Hamming weight of a 32-bits integer,
*   or the Hamming distance between a 32-bits integer and 0. In binary cases,
*   it is also called the population count, or popcount.[1]
*
*   The best solution known are based on adding counts in a tree pattern
*   (divide and conquer). Due to space limit, here is an example for a
*   8-bits binary number A=01101100:[1]
* | Expression            | Binary   | Decimal | Comment                    |
* | A                     | 01101100 |         | the original number        |
* | B = A & 01010101      | 01000100 | 1,0,1,0 | every other bit from A     |
* | C = (A>>1) & 01010101 | 00010100 | 0,1,1,0 | remaining bits from A      |
* | D = B + C             | 01011000 | 1,1,2,0 | # of 1s in each 2-bit of A |
* | E = D & 00110011      | 00010000 | 1,0     | every other count from D   |
* | F = (D>>2) & 00110011 | 00010010 | 1,2     | remaining counts from D    |
* | G = E + F             | 00100010 | 2,2     | # of 1s in each 4-bit of A |
* | H = G & 00001111      | 00000010 | 2       | every other count from G   |
* | I = (G>>4) & 00001111 | 00000010 | 2       | remaining counts from G    |
* | J = H + I             | 00000100 | 4       | No. of 1s in A             |
* Hence A have 4 1s.
*
* [1] http://en.wikipedia.org/wiki/Hamming_weight
*
* ===========================================================================
*/
#include <stdio.h>

typedef unsigned int UINT32;
const UINT32 m1 = 0x55555555; // 01010101010101010101010101010101
const UINT32 m2 = 0x33333333; // 00110011001100110011001100110011
const UINT32 m4 = 0x0f0f0f0f; // 00001111000011110000111100001111
const UINT32 m8 = 0x00ff00ff; // 00000000111111110000000011111111
const UINT32 m16 = 0x0000ffff; // 00000000000000001111111111111111
const UINT32 h01 = 0x01010101; // the sum of 256 to the power of 0, 1, 2, 3

/* This is a naive implementation, shown for comparison, and to help in
* understanding the better functions. It uses 20 arithmetic operations
* (shift, add, and). */
int popcount_1(UINT32 x)
{
x = (x & m1) + ((x >> 1) & m1);
x = (x & m2) + ((x >> 2) & m2);
x = (x & m4) + ((x >> 4) & m4);
x = (x & m8) + ((x >> 8) & m8);
x = (x & m16) + ((x >> 16) & m16);
return x;
}

/* This uses fewer arithmetic operations than any other known implementation
* on machines with slow multiplication. It uses 15 arithmetic operations. */
int popcount_2(UINT32 x)
{
x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits
x += x >> 8;           //put count of each 16 bits into their lowest 8 bits
x += x >> 16;          //put count of each 32 bits into their lowest 8 bits
return x & 0x1f;
}

/* This uses fewer arithmetic operations than any other known implementation
* on machines with fast multiplication. It uses 12 arithmetic operations,
* one of which is a multiply. */
int popcount_3(UINT32 x)
{
x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits
return (x * h01) >> 24; // left 8 bits of x + (x<<8) + (x<<16) + (x<<24)
}

int main()
{
int i = 0x1ff12ee2;
printf("i = %d = 0x%xn", i, i);
printf("popcount_1(%d) = %dn", i, popcount_1(i));
printf("popcount_2(%d) = %dn", i, popcount_2(i));
printf("popcount_3(%d) = %dn", i, popcount_3(i));
/* If compiled with other compiler than gcc, comment the line bellow. */
printf("GCC's __builtin_popcount(%d) = %dn", i, __builtin_popcount(i));
return 0;
}

《统计二进制中 1 的个数》上有8条评论

Damocles说道：

2008-11-27 23:35

我上次的想法是用bsfl来实现的，和linux的O（1）调度类似

回复
James说道：

2008-11-28 09:13

Solex,

请问gcc有没有反转一个整数中所有bit的builtin？

回复
Iron_Feet说道：

2008-11-29 13:52

我也买了那本书，感觉的确很棒～

回复
xin说道：

2008-12-28 13:38

谢谢你的指正，新的算法已经在8月份的第2次印刷的时候简单提到了。我们会在第二版的时候更完整地加入新的算法。

回复
pcghost说道：

2009-02-08 15:48

我靠，那本书我就借来随便翻过，看到这个题比较简单。就随便做了一下，不过我是用mathematica写的程序。当然我并没有考虑效率。
已经长期不写程序了，正考虑以后每年写两三个程序练一练。

回复
来_给我看看说道：

2012-07-08 13:04

非常感谢你能写出这么好的东西，但是在对照http://en.wikipedia.org/wiki/Hamming_weight这里的内容，和查看了java中Integer中关于这部分实现的地方，发现了您程序中popcount_2中最后一步的一个问题，这个问题可能导致32个数都为1时出现结果错误。Hamming_weight中64位return x & 0x7f，而你是32位的时候应该降1位就可以了，应该是return x & 0x3f，而return x & 0x1f应该是有问题的。

回复
jaxenix说道：

2012-12-07 22:21

其实这几种方法和查表法的时间开销差不多, 好处是不需要额外的空间

回复

相关阅读

《统计二进制中 1 的个数》上有8条评论

回复 xin 取消回复