Difference between revisions of "Remove Wikipedia references from HTML"
From OPOSSEM
(Created page with "<!-- add any hidden notes here --> The following code will remove all of the Wikipedia references from an HTML file downloaded from FireFox 1. Use the "File/Save Page As..." to...") |
|||
Line 1: | Line 1: | ||
<!-- add any hidden notes here --> | <!-- add any hidden notes here --> | ||
− | The | + | ==The Problem Being Solved== |
+ | |||
+ | Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFoxwhile leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute) | ||
+ | |||
+ | ==The Code== | ||
+ | |||
+ | The following code will remove all of the Wikipedia references | ||
1. Use the "File/Save Page As..." to save the page to a file myfile.html | 1. Use the "File/Save Page As..." to save the page to a file myfile.html | ||
Line 11: | Line 17: | ||
3. Use "File/Open File" to open the new file, which you can then print. | 3. Use "File/Open File" to open the new file, which you can then print. | ||
+ | |||
+ | ==Example== | ||
+ | |||
+ | These are pdf files printed from FireFox showing the before and after versions of a file | ||
+ | |||
+ | [[File:Dewikify.OriginalFile.pdf Original file]] | ||
+ | |||
+ | |||
+ | [[File:Dewikify.NewFile.pdf Modified file]] | ||
+ | |||
+ | Original code: [[User:Philip Schrodt|Philip Schrodt]] 10:49, 12 July 2011 (PDT) |
Revision as of 09:49, 12 July 2011
The Problem Being Solved
Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFoxwhile leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)
The Code
The following code will remove all of the Wikipedia references
1. Use the "File/Save Page As..." to save the page to a file myfile.html
2. In the Terminal, run
perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html
3. Use "File/Open File" to open the new file, which you can then print.
Example
These are pdf files printed from FireFox showing the before and after versions of a file
File:Dewikify.OriginalFile.pdf Original file
File:Dewikify.NewFile.pdf Modified file
Original code: Philip Schrodt 10:49, 12 July 2011 (PDT)