Category : urllib

I’m having next code: resp, page = httplib2.Http().request("https://blog.coinbase.com") tree = html.fromstring(page) headers_texts_local = tree.xpath(‘//h3/div/text()’) It does parsing of all headers from website and I need it to get fast as possible. The problem is that it takes 1.3-1.4 seconds on average and that’s too much slow, would like to get smth like 0.2s at least. ..

Read more

Here’s an extremely simple script(5 lines!) that I wrote. I’d like to fetch HTML data specifically including the subject_text and the price class. import re from urllib import request url = ‘https://section.cafe.naver.com/ca-fe/home/search/c-articles?q=%EB%A1%A4%EB%9E%9C%EB%93%9C&ss=ON_SALE’ contents = str(request.urlopen(url).read().decode("utf8")) print(contents) But when I print the contents, there seems to be a noscript error. Because it says like this in ..

Read more

Today I faced a strange behavior in a headless driven print server (RaspPi / python3). I need to download both, either PDFs or rendered python scripts, from a web server. Until now I did use: src = "https://ssl.server.tld/path/to/file.pdf" target = "/path/to/saved.pdf" os.system("wget -O "+target+" "+src) From now on I use: with urllib.request.urlopen(src) as response, open(target, ..

Read more

Background Running this snippet of code in python’s interpreter, we get an IP address for gov.uk. >>> import socket >>> socket.gethostbyname(‘gov.uk’) ‘151.101.64.144’ gov.uk is a TLD according to Wikipedia and the Public Suffix List. Similar TLDs that are also domains include gov.au, gov.br, and s3.amazonaws.com. In trying to answer this question with python, I tried ..

Read more