开源项目源码解读--PyDictionary，获取英语单词的含义，同义词，反义词

1. PyDictionary

PyDictionary 是一个非常小的词典模块，你可以用它来获取英语单词的含义，同义词，反义词，还可以用来翻译英语。看起来是不是很神奇，但实现原理却很简单，它的背后是一个小小的爬虫，通过构造特定的url来访问网站，最后解析网页，获得想要的信息。

下面是一段示例代码

from PyDictionary import PyDictionary
dictionary = PyDictionary()

print(dictionary.meaning("indentation"))
print(dictionary.synonym("Life"))
print(dictionary.antonym("Life"))

print (dictionary.translate("Range", 'zh'))

translate方法用的是谷歌翻译，国内是不能访问的，其他三个方法，获取含义，同义词，反义词都是可以用的。

git地址： https://github.com/geekpradd/PyDictionary

2. 获取单词含义和解释

    def meaning(term, disable_errors=False):
        if len(term.split()) > 1:
            print("Error: A Term must be only a single word")
        else:
            try:
                html = _get_soup_object("http://wordnetweb.princeton.edu/perl/webwn?s={0}".format(
                    term))
                types = html.findAll("h3")
                length = len(types)
                lists = html.findAll("ul")
                out = {}
                for a in types:
                    reg = str(lists[types.index(a)])
                    meanings = []
                    for x in re.findall(r'\((.*?)\)', reg):
                        if 'often followed by' in x:
                            pass
                        elif len(x) > 5 or ' ' in str(x):
                            meanings.append(x)
                    name = a.text
                    out[name] = meanings
                return out
            except Exception as e:
                if disable_errors == False:
                    print("Error: The Following Error occured: %s" % e)

synonym和antonym方法里也使用了_get_soup_object，这3个方法都实现了一个小爬虫，来看一下_get_soup_object 函数

import requests
from bs4 import BeautifulSoup

def _get_soup_object(url, parser="html.parser"):
    return BeautifulSoup(requests.get(url).text, parser)

非常非常简单的实现，其实爬虫原本也并不难，知道url，使用BeautifulSoup配合requests就能获取到页面信息，接下来你只需要解析网页就好了。

具体如何解析网页，我这里不做介绍了，感兴趣的同学可以自己研究，这个词典库给了我们一个解决问题的思路，有些事情，已经有网站实现了，那么我们就可以编写爬虫为己所用。

3. 挑一挑毛病

这个库的核心代码都放在了core.py文件中，只有140行左右，个人的看法，这份代码写的并不属于优秀的那一类，有许多可以改进的地方

3.1 判断python版本

为了兼容python2和3，要在代码里确定运行环境是哪个版本，然后根据版本的不同引入不同的库或者实现不同的函数和类。

python2 = False
if list(sys.version_info)[0] == 2:
    python2 = True

这段判断python版本的代码本身没有问题，但是奇怪的是源码里并没有使用python2这个变量，这段代码没有存在的必要。

3.2 代码不够精简

获取同义词和反义词的代码，几乎是一模一样的，完全可以精简为一个函数

    @staticmethod
    def _synonym_or_antonym(term, formatted=False, _type='synonym'):
        if len(term.split()) > 1:
            print("Error: A Term must be only a single word")
        else:
            try:
                data = _get_soup_object("https://www.synonym.com/synonyms/{0}".format(term))
                section = data.find('div', {'class': 'type-{_type}'.format(_type=_type)})
                spans = section.findAll('a')
                synonyms = [span.text.strip() for span in spans]
                if formatted:
                    return {term: synonyms}
                return synonyms
            except:
                print("{0} has no {_type} in the API".format(term=term, _type=_type))
                
    @staticmethod
    def synonym(term, formatted=False):
        return PyDictionary._synonym_or_antonym(term, formatted=formatted, _type='synonym')
    
    
    @staticmethod
    def antonym(term, formatted=False):
        return PyDictionary._synonym_or_antonym(term, formatted=formatted, _type='antonym')