Category : web-scraping

I have written a python program for web scraping in the jupyter notebook: from bs4 import BeautifulSoup import requests page = requests.get(url) #Store the contents of the website under doc doc = lh.fromstring(page.content) #Parse data that are stored between <tr>..</tr> of HTML tr_elements = doc.xpath(‘//tr’) r = requests.get(url) #Create empty list col=[] i=0 #For each ..

Read more

I am trying to run below python script in google cloud function. Runtime : Python 3.8 Entry point : hello_world from requests_html import HTMLSession def hello_world(request): session = HTMLSession() r = session.get(‘https://translate.google.com’) r.html.render() app = r.html.find(‘#yDmH0d’) for value in app: return value.text[:10] Triggering event: {} When I am testing this function in google cloud platform, ..

Read more

I am using Scrapy and I want to extract each topic that has at least 4 posts. I have two separate selectors : real_url_list in order to get the href for each topic nbpostsintopic_resp to get the numbers of posts real_url_list = response.css("td.col-xs-8 a::attr(href)").getall() for topic in real_url_list: nbpostsintopic_resp = response.css("td.center ::text").get() nbpostsintopic = nbpostsintopic_resp[0] ..

Read more

I’m a Python NOOB trying to write Scrapy results to Firebase Firestore using Python 3. Spider results are logging correctly to the console, but I can’t seem to write to my Firestore DB. Any help is greatly appreciated. ERROR Message: db = firestore.client() AttributeError: module ‘google.cloud.firestore’ has no attribute ‘client’ Pipeline File: import firebase_admin from ..

Read more