Issue
I have a very big HTML file (talking about 20MB) and I need to remove from the file a large amount of nodes of the form:
<tr><td>SPECIFIC-STRING</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
The file I need to work on is basically made of thousands of these strings, and I only need to remove those that have a specific first string, for instance, all those with the first string being "banana":
<tr><td>banana</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
I tried achieving this opening the file in Geany and using the replace feature with this regex:
<tr><td>banana<\/td><td>(.*)<\/td><td>(.*)<\/td><\/tr><tr><td(.*)<\/td><\/tr>
but the console output was that it removed X amount of occurrences, when I know there are way more occurrences than that in the file. Firefox, Chrome and Brackets fail even to view the html code of the file due to it's size. I can't think of another way to do this due to my large unexperience with HTML.
Solution
You could be using a stream editor which as the name suggest streams the file content, thus never loads the whole file into the main memory.
A popular editor is sed. It does support RegEx.
Your command would have the following structure.
sed -i -E 's/SEARCH_REGEX/REPLACEMENT/g' INPUTFILE
-Efor support of extended RegEx-ifor in-place editing modesdenotes that you want to replace valuesgis forglobal. By defaultsedwould only replace the first occurrence so to replace all occrrences you must providegSEARCH_REGEXis the RegEx you need to find the substrings you want to replaceREPLACEMENTis the value you want to replace all matches withINPUTFILEis the filesedis gonna read line-by line and do the replacement for you.
Answered By - Mushroomator
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.