Issue
I am trying to get table data from this site. I am having issues: the majority of the data is locked behind a button press, and once the button is pressed, the table data isn't within the html? I can see it from the web inspector, but this is confusing me. Here is my code so far:
def getData(state, year):
url = f"https://www.countyhealthrankings.org/explore-health-rankings/county-health-rankings-model/health-outcomes/length-of-life/infant-mortality?year={year}&state={state}&tab=1"
driver = webdriver.Chrome()
driver.get(url)
try:
WebDriverWait(driver, 10).until(lambda x: x.execute_script('return document.readyState') == 'complete')
button = driver.find_element(By.XPATH, '//span[contains(text(), "Show")]')
driver.execute_script("arguments[0].click();", button)
table = driver.find_element('#state-snapshot-data-table')
except Exception as e:
return pd.DataFrame()
finally:
driver.quit()
rows = table.find('tbody').find_all('tr')
data = []
for row in rows:
cells = row.find_all('td')
name = cells[0].text.strip()
numerator = cells[1].text.strip()
raw_value = cells[2].text.strip()
ci_range = cells[3].text.strip()
data.append([name, numerator, raw_value, ci_range])
return pd.DataFrame(data, columns=['name', 'numerator', 'raw_value', 'ci_range'])
print( getData('01', '2023') )
Any clarification or pointers on how to approach this would be greatly appreciated.
Solution
there are a couple of issues with your function.
First:
you are hiding any error related to table search.
except Exception is a really bad practice
Second:
You are quitting the driver in the 'finally' block. This means the driver won't be available at the moment when you will be looking for the rows.
The 'finally' block gets executed always, no matter if there was an exception or not.
Third:
You are using selenium methods/locators incorrectly. E.g. using find_element w/o 'By', or using find/find_all methods which don't exist at all. Also, you are using /span tag instead of the /button. So your button cannot be clicked.
The following code works ok for me:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
def getData(state, year):
url = f"https://www.countyhealthrankings.org/explore-health-rankings/county-health-rankings-model/health-outcomes/" \
f"length-of-life/infant-mortality?year={year}&state={state}&tab=1"
driver = webdriver.Chrome()
driver.get(url)
try:
WebDriverWait(driver, 10).until(lambda x: x.execute_script('return document.readyState') == 'complete')
button = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.XPATH, "//button[contains(text(), 'Show')]"))
)
driver.execute_script('arguments[0].click();', button)
rows = driver.find_element(By.ID, 'state-snapshot-data-table') \
.find_element(By.TAG_NAME, 'tbody') \
.find_elements(By.TAG_NAME, 'tr')
data = []
for row in rows:
cells = row.find_elements(By.TAG_NAME, 'td')
name = cells[0].text.strip()
numerator = cells[1].text.strip()
raw_value = cells[2].text.strip()
ci_range = cells[3].text.strip()
data.append([name, numerator, raw_value, ci_range])
return pd.DataFrame(data, columns=['name', 'numerator', 'raw_value', 'ci_range'])
finally:
driver.quit()
print(getData('01', '2023'))
In case of any issue - the function will fail - that's fine, you should check the stack trace and see what exactly went wrong.
Answered By - sashkins
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.