python 批量查询网站的pr

python 批量查询网站的pr
python 批量查询网站的pr

python 批量查询网站的pr

前段时间因为要批量的筛选外链的资源,而外链网站的PR则是一个重要的指标,特别是对做GG的SEO的朋友来说,我们肯定是希望筛选出很多有效的而且PR高的外链资源,由于要筛选的网站比较多,只有用程序来做了。代码贴出来,如果大家感兴趣可以运行下看看,要查询的网站我这里是放到文件里,你也可以放到数据库里,然后读出来。结果也是写到文件里,同样你也可以改代码,然后把查询的结果放到数据库里。下面代码:

info.txt 一行一个网站

https://www.360docs.net/doc/9612404909.html,

https://www.360docs.net/doc/9612404909.html,

然后输出的结果是:

https://www.360docs.net/doc/9612404909.html,,1

https://www.360docs.net/doc/9612404909.html,,3

前面是网址,或者是对应的pr,如果该网站查询失败的话,那pr=-1

我本来想用google提供的API接口,但是这个接口好像现在无法访问,所以我只有调用chinaz的查询程序,然后自己通过python程序去获取相关的信息。

这里主要是用到了httplib,urllib,和python的正则表达式的内容,感兴趣的朋友可以看看他们的文档和使用说明。声明:本程序只供大家学习用途,一切作为商业用途与本人无关。

# -*- coding: utf-8 -*-

import re,urllib,httplib,time

def get_url(url):

'''获取标准的url'''

host_re = https://www.360docs.net/doc/9612404909.html,pile(r'^https?://(.*?)($|/)',

re.IGNORECASE

)

return host_re.search(url).group(0)[7:-1]

def get_pr(url):

'''获取相关的pr'''

params = urllib.urlencode({'PRAddress':url})

headers = {"Content-type": "application/x-www-form-urlencoded",

"Accept": "text/plain",

"User-agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)",

"Referer":"https://www.360docs.net/doc/9612404909.html,/?PRAddress=https://www.360docs.net/doc/9612404909.html,"

}

conn = httplib.HTTPConnection("https://www.360docs.net/doc/9612404909.html,")

conn.request("GET", "", params, headers)

response = conn.getresponse()

data = response.read()

datautf8 = data.decode('utf-8')

posin = datautf8.find('enkey')

keyinfo = datautf8[posin+6:posin+38]

opener = urllib.FancyURLopener()

opener.addheaders = [

('User-agent','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')

]

hosturl = "https://www.360docs.net/doc/9612404909.html,/ajaxsync.aspx?at=pr&enkey=%s&url=%s" % (keyinfo,url)

info = opener.open(hosturl).read()

cinfo = info.decode('utf-8').encode('gbk')

num_re = https://www.360docs.net/doc/9612404909.html,pile(r'[0-9]')

pr_num = num_re.search(cinfo).group(0)

print pr_num

return pr_num

f = file('pr.txt','w')

for m in file('info.txt','r'):

murl = m.strip()

# checkurl = get_url(murl)

try:

prnum = get_pr(murl)

except Exception,e:

prnum = -1

content = "%s,%s\n" % (murl,prnum)

f.write(content)

continue

else:

content = "%s,%s\n" % (murl,prnum)

f.write(content)

time.sleep(5)

f.close()

想要了解更多python教程,

可以上老王python: https://www.360docs.net/doc/9612404909.html,

相关主题
相关文档
最新文档