Using Beautiful Soup in Python to Log in Webpage And Download Multiple zip file

  beautifulsoup, download, python, unzip, web-scraping

I am first time users to web scrapping and Beautiful soup.

I have two queries first to pass login information to the files I want download and secondly to download multiple zip file. I am pasting my code below without the curl/log in information.

Firstly, I have a web page which requires log in to download file. I am able to log in with Beautiful soup but thereafter I unable to go further as I not able to pass the login information in python to the particular file I want to download. So basically how can I let python know to use the login credential to the file= baseurl+href_link file.

And secondly my link file is a zip file without having .zip at the end. For example my baseurl= 'https://consumerpyramidsdx.cmie.com' and href_link file /kommon/bin/sr.php? kall=wsubsdl&fn=consumption_pyramids_20140131_MS_rev&fmt=csv&rrurl=consumptionpyramidsdx So how can I use it download all the zip files and unzip it? Most of the forum query on this uses ‘.zip’ explicitly as their href has .zip but in my case it doesn’t.

My code are following:

response = requests.post('https://consumerpyramidsdx.cmie.com/kommon/bin/sr.php', headers=headers, params=params, cookies=cookies, data=data)
soup = BeautifulSoup(response.content, "lxml")
baseurl= 'https://consumerpyramidsdx.cmie.com'
print(soup) 

for x in soup.find_all("a"):
    if x.text =='CSV':
        file_link = x.get('href') #contains the href_link file I want to download 
        print(file_link)
        # After this I want to download all the baseurl+file_link files  
        

Source: Python Questions

LEAVE A COMMENT