Issue
I have the following html code:
<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>
So to get the text data as: Lorem ipsum si ammet
, so I tried to use:
response.css('div.article >p::text ').extract()
But I only receive only lorem sie ammet
.
How can I get both <p>
and <strong>
texts using CSS selectors?
Solution
One liner solution.
"".join(a.strip() for a in response.css("div.article *::text").extract())
div.article *
means to scrape everything inside the div.article
Or an easy way to write it
text = ""
for a in response.css("div.article *::text").extract()
text += a.strip()
Both approaches are same,
Answered By - Umair Ayub
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.