Issue
I want to remove all elements, including the ones with attributes like class, from my string.
I already checked here, so regex is apparently not the answer: RegEx match open tags except XHTML self-contained tags
I currently already have something with regex that replaces all tags from a string (note, I'm never parsing a full HTML document if that matters) and preserves the content: Regex.Replace(s, "<[^>]*(>|$)", String.Empty). However, I just want the div tags removed and preserve the content.
So I have:
<div class=""fade-content""><div><span>some content</span></div></div>
<div>some content</div>
Desired output:
<span>some content</span>
some content
I was going the regex path stil, and trying something like: <div>.*<\/div>, but that excludes divs with attributes.
How can I remove div elements only, using VB.NET?
Solution
There are several ways to do this. One, short and simple, is the following one:
Regex.Replace(s, "</?div.*?>", String.Empty)
Here is an example:
's simulates your html file
Dim s As String = "<div class="""" fade-content""""><div><span>some content</span></div></div>" + Environment.NewLine + "<div>some content</div>"
'let's store the result in s1
Dim s1 As String = Text.RegularExpressions.Regex.Replace(s, "</?div.*?>", String.Empty)
'output
MessageBox.Show(s1)
Output:
Answered By - Calaf

0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.