6.3 Web APIs交互
许多网站都有一些通过JSON或其他格式提供数据的公共API。通过Python访问这些API的办法有不少。一个简单易用的办法(推荐)是requests包(http://docs.python-requests.org)。
为了搜索最新的30个GitHub上的pandas主题,我们可以发一个HTTP GET请求,使用requests扩展库:
In [113]: import requests
In [114]: url = 'https://api.github.com/repos/pandas-dev/pandas/issues'
In [115]: resp = requests.get(url)
In [116]: resp
Out[116]: <Response [200]>
响应对象的json方法会返回一个包含被解析过的JSON字典,加载到一个Python对象中:
In [117]: data = resp.json()
In [118]: data[0]['title']
Out[118]: 'Period does not round down for frequencies less that 1 hour'
data中的每个元素都是一个包含所有GitHub主题页数据(不包含评论)的字典。我们可以直接传递数据到DataFrame,并提取感兴趣的字段:
In [119]: issues = pd.DataFrame(data, columns=['number', 'title',
.....: 'labels', 'state'])
In [120]: issues
Out[120]:
number title \
0 17666 Period does not round down for frequencies les...
1 17665 DOC: improve docstring of function where
2 17664 COMPAT: skip 32-bit test on int repr
3 17662 implement Delegator class
4 17654 BUG: Fix series rename called with str alterin...
.. ... ...
25 17603 BUG: Correctly localize naive datetime strings...
26 17599 core.dtypes.generic --> cython
27 17596 Merge cdate_range functionality into bdate_range
28 17587 Time Grouper bug fix when applied for list gro...
29 17583 BUG: fix tz-aware DatetimeIndex + TimedeltaInd...
labels state
0 [] open
1 [{'id': 134699, 'url': 'https://api.github.com... open
2 [{'id': 563047854, 'url': 'https://api.github.... open
3 [] open
4 [{'id': 76811, 'url': 'https://api.github.com/... open
.. ... ...
25 [{'id': 76811, 'url': 'https://api.github.com/... open
26 [{'id': 49094459, 'url': 'https://api.github.c... open
27 [{'id': 35818298, 'url': 'https://api.github.c... open
28 [{'id': 233160, 'url': 'https://api.github.com... open
29 [{'id': 76811, 'url': 'https://api.github.com/... open
[30 rows x 4 columns]
花费一些精力,你就可以创建一些更高级的常见的Web API的接口,返回DataFrame对象,方便进行分析。