Issue
I have some Calibre created epubs that I want to make into markdown to use in Obsidian. I found Pandoc and my simple attempts at conversion are, among other things, losing the italics and passing the Calibre span tags through, which don't show as italics in Obsidian.
If I turn off the raw_html extension it doesn't pass all the span tags through, but I don't get any italics, either. What I want to do is convert the html:
<span class="italic">Some Words</span>
into italic text in my final markdown file. If Pandoc can do this, that would be great. Otherwise I'll take a swipe at converting the html before passing it into Pandoc, but a lot of the span tags that Calibre generated are stacked a few layers deep, so a really simple solution would be great.
Does Pandoc handle this directly or do I need to deal with the html first? I'm not just concerned with italics only, there are a bunch of other formatting issues that use variouos Calibre span tags that could be simpler, like bold and some headings. So I'm trying to work out a way to deal with them all.
Solution
Pandoc does not parse CSS and hence has no way to know that this should be put into italics. A good solution is to modify pandoc's internal document representation using a Lua filter.
function Span (span)
if span.classes:includes 'italic' then
return pandoc.Emph(span.content)
end
end
This filter checks if the span has class italic and, if it does, converts it into emphasized text, which will usually be output in italics. Use the filter by saving it to a file and pass that file pandoc via the --lua-filter command line option.
You'll likely want to handle more classes; other pandoc constructors you might want to use are pandoc.Strong and pandoc.Underline, etc.. Run pandoc with --to=native to see how pandoc represents the document internally.
Answered By - tarleb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.