Issue
I am trying to use beautiful soup to parse html and find all href with a specific anchor tag
<a href="http://example.com">TEXT</a>
<a href="http://example.com/link">TEXT</a>
<a href="http://example.com/page">TEXT</a>
all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF.
For clarification looking for something similar to using the class to parse for the links
<a href="http://example.com" class="visible">TEXT</a>
<a href="http://example.com/link" class="visible">TEXT</a>
<a href="http://example.com/page" class="visible">TEXT</a>
and then using
findAll('a', 'visible')
except the HTML I am parsing doesn't have a class but always the same anchor text.
Solution
Would something like this work?
In [39]: from bs4 import BeautifulSoup
In [40]: s = """\
....: <a href="http://example.com">TEXT</a>
....: <a href="http://example.com/link">TEXT</a>
....: <a href="http://example.com/page">TEXT</a>
....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>"""
In [41]: soup = BeautifulSoup(s)
In [42]: for link in soup.findAll('a', href=True, text='TEXT'):
....: print link['href']
....:
....:
http://example.com
http://example.com/link
http://example.com/page
Answered By - RocketDonkey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.