Category : web-scraping

Context I am currently going through a course on webscraping. Upon getting to the module on scraping javascript, a function set_1.difference(set_2) was used to distinguish the old variables from the newly created variables. But when I did it, it brought up this error: AttributeError: ‘list’ object has no attribute ‘difference’ I searched online and stumbled ..

Read more

How do I select options from the dropdown menus and essentially download the resulting CSV generated using these inputs? The website in question is this https://sisweb.tesouro.gov.br/apex/f?p=2691:2&minimal=full&font=opensans (website is not in english but can be easily translated by right clicking anywhere on the page and select Tranlate to English). I know this might be a lot ..

Read more

Im trying to scrape the network traffic of http://live.jokerswidget.com/freelivematch/8534782740463890.html (or any of the links here wich follows the same structure http://live.jokerswidget.com/). I can see the m3u on the network console but i cant get it on Selenium. Here is my code: option = Options() option.add_argument("–autoplay-policy=no-user-gesture-required") option.add_argument("–auto-open-devtools-for-tabs") driver = webdriver.Chrome(chrome_options=option) driver.get(url) sleep(3) res=np.nan Resources = driver.execute_script("return ..

Read more

I’m attempting to get Selenium webdriver to select an option from a dropdown menu from the page referenced below. The drop down menu looks like this: <select name="_year" id="select1" onchange="this.form.submit();" style="width: 10%; font: 12px Arial;"> <option selected="">All Years </option><option>2018-19 </option><option>2017-18 </option><option>2016-17 </option><option>2015-16 </option><option>2014-15 </option><option>2013-14 </option><option>2012-13 </option><option>2011-12 </option><option>2010-11 </option><option>2009-10 </option><option>2008-09 </option><option>2007-08 </option><option>2006-07 </option><option>2005-06 </option><option>2004-05 </option><option>2003-04 </option><option>2002-03 ..

Read more

I’m trying to scrap data from the URL and print out them 1 by 1. Below is my code : import requests from bs4 import BeautifulSoup from pandas import DataFrame header = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ‘ ‘Chrome/87.0.4280.88 Safari/537.36 ‘ } def scrapping(url): response = requests.get(str(url), headers=header) ..

Read more

Lines of code are: import requests from bs4 import BeautifulSoup hd = {‘Accept-Language’: ‘en,en-US’} res = requests.get(‘https://www.udemy.com/courses/search/?q=python%20web%20scraping&src=sac&kw=python%20web%20sc’, headers = hd) soup = BeautifulSoup(res.content, ‘lxml’) courses = soup.find(‘div’, class_=’popper–popper–19faV popper–popper-hover–4YJ5J’) print(courses) I am trying to get the course name from -div class name ‘popper–popper–19faV popper–popper-hover–4YJ5J’ but getting ‘None’ Any suggestion how to get the course name ..

Read more

I’m trying to make a spider wich scrapes memes and their descriptions. I want my spider to scrape not more than MAX_MEMES_FOR_TEMPLATE (7042) meme per one template and no more than MAX_TEMPLATES (42) templates. What is the best way to do it? Now my spider looks like this. class MemesSpider(scrapy.Spider): name = ‘imgflip’ start_urls = ..

Read more

I was trying to get some images from this site via JSON data, the code to get images works fine but I get an error when the pagination (the number of pages) exceeds. Url to get images looks something like this: URL = ‘https://danbooru.donmai.us/posts.json?tags=’ + tag + ‘&page=’ + str(randint(1, 1000)) Here in ‘&page=’ + ..

Read more