I’m using a Python module that scrapes a site and noticed in the below code that it handles different tables differently:
def player_stats(request, stat, numeric=False, s_index=False): """ """ supported_tables = ["totals", "per_minute", "per_poss", "advanced", "playoffs_per_game", "playoffs_totals", "playoffs_per_minute", "playoffs_per_poss", "playoffs_advanced"] if stat == "per_game": soup = BeautifulSoup(request.text, "html.parser") table = soup.find("table", id="per_game") elif stat in supported_tables: soup = BeautifulSoup(request.text, "html.parser") comment_table = soup.find(text=lambda x: isinstance(x, NavigableString) and stat in x) soup = BeautifulSoup(comment_table, "html.parser") table = soup.find("table", id=stat) else: raise TableNonExistent
An example of the a page this would be used on:
If one were to do
soup.find_all("table"), only the first table would be found. The above code seems to check for "comments" in the HTML and then applies BeautifulSoup to that again. I have a few questions:
Why aren’t the other tables found? They are also HTML tags (not commented out) so I’m struggling to understand the difference.
What is the
comment_tableline of code really doing? To me, it looks like it’s checking for
textattributes that are
NavigableStrings that contain an element in
If I’m right about the above, how does BeautifulSoup simply parse that block of text? Is it "magic" or does that text have to be of a specific form…and we’re, therefore, lucky in this case?
Let me know if you need more information to answer the questions. Thanks!
Source: Python Questions