Issue
I apply find_all
on a beautifulsoup
object, and find something, which is an bs4.element.ResultSet
object or a list
.
I want to further do find_all
in there, but it's not allowed on a bs4.element.ResultSet
object. I can loop through each element of the bs4.element.ResultSet
object to do find_all
. But can I avoid looping and just convert it back to a beautifulsoup
object?
Here is my code:
html_1 = """
<table>
<thead>
<tr class="myClass">
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
</tr>
</thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')
type(soup) #bs4.BeautifulSoup
# do find_all on beautifulsoup object
th_all = soup.find_all('th')
# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list
# now I want to further do find_all
th_all.find_all(text='A') #not work
# can I avoid this need of loop?
for th in th_all:
th.find_all(text='A') #works
Solution
ResultSet
class is a subclass of a list and not a Tag
class which has the find*
methods defined. Looping through the results of find_all()
is the most common approach:
th_all = soup.find_all('th')
result = []
for th in th_all:
result.extend(th.find_all(text='A'))
Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all()
is possible with the select()
method. For instance, there is no "text" search available in bs4
CSS selectors. But, if, for example, you had to find all, say, b
elements inside th
elements, you could do:
soup.select("th td")
Answered By - alecxe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.