Issue
Thanks in advance for looking into this query.
I am trying to extract data from the angular
response which is not visible in the HTML code using the inspect function of Chrome browser.
I researched and looked for solutions and have been able to find the data in the Network (tab)>Fetch/XHR>Response
(screenshots) and also wrote code based on the knowledge I gained researching this topic.
Response
In order to read the response I am trying the below code by passing the parameters and cookies grabbed from the main URL
and passing them into the request via the below code segment from the main code shared further below. The parameters were created based on information I found under tab Network (tab)>Fetch/XHR>Header
http = urllib3.PoolManager()
r = http.request('GET',
'https://www.barchart.com/proxies/core-api/v1/quotes/get?' + urlencode(params),
headers=headers
)
QUESTIONS
- Please help confirm what am I missing or doing wrong? I want to read and store the json response what should I be doing? JSON to be extracted
- Also is there a way to read the params using a function?, instead of assigning them as I have done below. What I mean is similar to what I have done for cookies (headers = x.cookies.get_dict()) is there a way to read and assign parameters?
Below is the full code I am using.
import requests
import urllib3
from urllib.parse import urlencode
url = 'https://www.barchart.com/etfs-funds/performance/percent-change/advances?viewName=main&timeFrame=5d&orderBy=percentChange5d&orderDir=desc'
header = {'accept': 'application/json', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
s = requests.Session()
x = s.get(url, headers=header)
headers = x.cookies.get_dict()
params = { 'lists': 'etfs.us.percent.advances.unleveraged.5d',
'orderDir': 'desc',
'fields': 'symbol,symbolName,lastPrice,weightedAlpha,percentChangeYtd,percentChange1m,percentChange3m,percentChange1y,symbolCode,symbolType,hasOptions',
'orderBy': 'percentChange',
'meta': 'field.shortName,field.type,field.description,lists.lastUpdate',
'hasOptions': 'true',
'page': '1',
'limit': '100',
'in(leverage%2C(1x))':'',
'raw': '1'}
http = urllib3.PoolManager()
r = http.request('GET',
'https://www.barchart.com/proxies/core-api/v1/quotes/get?' + urlencode(params),
headers=headers
)
r.data
r.data
response is below, returning an error.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META
HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=iso-8859-1">\n<TITLE>ERROR: The request could not be
satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request
could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest
blocked.\nWe can\'t connect to the server for this app or website at
this time. There might be too much traffic or a configuration error.
Try again later, or contact the app or website owner.\n<BR
clear="all">\nIf you provide content to customers through CloudFront,
you can find steps to troubleshoot and help prevent this error by
reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade
size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID:
vcjzkFEpvdtf6ihDpy4dVkYx1_lI8SUu3go8mLqJ8MQXR-KRpCvkng==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>
Solution
You can get reponse by name, on your screenshot name get?lists=etfs.us
is what you need, you also need to install playwright
There is a guide here: https://www.zenrows.com/blog/web-scraping-intercepting-xhr-requests#use-case-nseindiacom
from playwright.sync_api import sync_playwright
url = "https://www.barchart.com/etfs-funds/performance/percent-change/advances?viewName=main&timeFrame=5d&orderBy=percentChange5d&orderDir=desc"
with sync_playwright() as p:
def handle_response(response):
# the endpoint we are insterested in
if ("get?lists=etfs.us" in response.url):
print(response.json())
browser = p.chromium.launch()
page = browser.new_page()
page.on("response", handle_response)
page.goto(url, wait_until="networkidle")
page.context.close()
browser.close()
Answered By - oruchkin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.