Issue
I'm trying to get a title of html document in python, but getting weird symbols. I guess that's because of encoding, but the html doc in utf-8 encoding. Is there any way I can get normal letters?
Here is code and what am I getting:
from bs4 import BeautifulSoup
with open("index.html") as file:
src = file.read()
soup = BeautifulSoup(src, "lxml")
title = soup.title.text
print(title)
Главная страница
Solution
You need to specify an encoding type when opening the file:
with open("index.html", encoding='utf-8') as file:
src = file.read()
Answered By - Xiddoc
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.