小池有话说

人人上的数据分析

2014-01-24

我手里有一份人人网的数据,样本量大概两万。

SELECT
left(name,1) as `姓`, 
count(*) as `数量` , 
concat(truncate(count(*)/(select count(*) from renren_data.student) * 100,1),'%') as `百分比`
FROM renren_data.student group by left(name,1) order by count(*) desc
limit 20
排名 数量 百分比
1 1955 8.4%
2 1662 7.1%
3 1659 7.1%
4 1219 5.2%
5 838 3.6%
6 636 2.7%
7 511 2.2%
8 451 1.9%
9 418 1.8%
10 398 1.7%
11 391 1.6%
12 373 1.6%
13 294 1.2%
14 275 1.1%
15 272 1.1%
16 270 1.1%
17 209 0.9%
18 208 0.9%
19 196 0.8%
20 189 0.8%
SELECT 
left(uid, 1) as `uid的第一位`, 
count(*) as `数量`, 
concat(truncate(count(*)/(select count(*) from renren_data.student)*100 ,1),'%') as `百分比`
FROM renren_data.student
group by left(uid, 1)
order by count(*) desc
limit 10
uid的第一位 数量 百分比
2 16203 65.8%
3 2948 11.9%
1 2452 9.9%
4 1270 5.1%
5 781 3.1%
7 733 2.9%
6 674 2.7%
8 512 2.0%
9 450 1.8%

看来不符合那个什么定理啊。