Issue
I have the following link , and when I open the link via Chrome and then right-click the page and then choose "save as" to save the page into a HTML file (c:\temp\cu2.html)
After it is saved, I can open this cu2.html file with an HTML editor (say VS2015), and I can see inside the file, there is tag as seen below
However, if I open the link with IE11 (instead of Chrome), and then save the same page as HTML file, I cannot find this tag at all. Actually, the html file saved from IE11 is the same content as what I can extract with PowerShell script below.
#Requires -version 4.0
$url = 'https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017';
$wr = Invoke-WebRequest $url;
$wr.RawContent.contains('<table') # returns false
$wr.RawContent | out-file -FilePath c:\temp\cu2_ps.html -Force; #same as the file saved from the webpage to html file in IE
So my question is:
Why is a web page saved (as html file) in Chrome is different from that in IE?
How can I use PowerShell(or C#) to save such web page into a HTML file (same as the file saved in Chrome)?
Solution
The pages uses AngularJS and also jQuery. It means some contents will be loaded after document ready. So when you send the request using Invoke-WebRequest
, you only receive the original content of the page. Other contents will be loaded after a while.
To solve the problem, you can automate IE to get expected result. It's enough to wait fr the page to get ready and also wait a bit to run AngularJs logic and download required content, then get content of document element:
$ie = new-object -ComObject "InternetExplorer.Application"
$url = "https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017"
$ie.silent = $true
$ie.navigate($url)
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
Start-Sleep 10
$ie.Document.documentElement.innerHTML > "C:\Tempfiles\output.html"
$ie.Stop()
$ie.Quit()
Answered By - Reza Aghaei
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.