How to read link from beautifulsoup output python

  django, python

I am trying to pass a link I extracted from beautifulsoup.

import requests
r = requests.get('https://data.ed.gov/dataset/college-scorecard-all-data-files-through-6-2020/resources')
soup = bs(r.content, 'lxml')
links = [item['href'] if item.get('href') is not None else item['src'] for item in soup.select('[href^="http"], [src^="http"]') ]
print(links[1])

This is the link I am wanting.

Output: https://ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_07202021.zip

Now I am trying to pass this link through so I can download the contents.


# make a folder if it doesn't already exist
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

# pass the url
url = r'link from beautifulsoup result needs to go here'
response = requests.get(url, stream = True)

# extract contents
with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
    for elem in zf.namelist():
        zf.extract(elem, '../data')

My overall goal is trying to take the link that I webscraped and place it in the url variable because the link is always changing on this website. I want to make it dynamic so I don’t have to manually search for this link and change it when its changing and instead it changes dynamically. I hope this makes sense and appreciate any help I can get.

If I manually enter my code as the following I know it works

url = r’https://ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_07202021.zip’

If I can get my code to pass that exactly I know it’ll work I’m just stuck with how to accomplish this.

Source: Python Questions

LEAVE A COMMENT