Difference between revisions of "Remove Wikipedia references from HTML"
From OPOSSEM
m |
m |
||
(5 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
==The Problem Being Solved== | ==The Problem Being Solved== | ||
− | Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from | + | Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from '''''FireFox''''' while leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute) |
==The Code== | ==The Code== | ||
Line 9: | Line 9: | ||
The following code will remove all of the Wikipedia references | The following code will remove all of the Wikipedia references | ||
− | 1. Use the "File/Save Page As..." to save the page to a file myfile.html | + | 1. Use the "File/Save Page As..." to save the page to a file myfile.html. Note that FireFox will also create a separate folder with all of the ancillary files for the page—images and the like—which you just leave as-is. |
2. In the Terminal, run | 2. In the Terminal, run | ||
Line 15: | Line 15: | ||
<nowiki> | <nowiki> | ||
perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html</nowiki> | perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html</nowiki> | ||
+ | |||
+ | where ''myfile.html'' is the name of the file you just saved and ''mynicefile.html'' is the name of the new file. | ||
3. Use "File/Open File" to open the new file, which you can then print. | 3. Use "File/Open File" to open the new file, which you can then print. | ||
Line 20: | Line 22: | ||
==Example== | ==Example== | ||
− | These | + | These will take you to pdf files printed from FireFox showing the before and after versions of a file |
− | |||
− | |||
+ | Original file: [[File:Dewikify.OriginalFile.pdf]] | ||
− | [[File:Dewikify.NewFile.pdf | + | Modified file: [[File:Dewikify.NewFile.pdf]] |
Original code: [[User:Philip Schrodt|Philip Schrodt]] 10:49, 12 July 2011 (PDT) | Original code: [[User:Philip Schrodt|Philip Schrodt]] 10:49, 12 July 2011 (PDT) |
Latest revision as of 08:27, 13 July 2011
The Problem Being Solved[edit]
Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFox while leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)
The Code[edit]
The following code will remove all of the Wikipedia references
1. Use the "File/Save Page As..." to save the page to a file myfile.html. Note that FireFox will also create a separate folder with all of the ancillary files for the page—images and the like—which you just leave as-is.
2. In the Terminal, run
perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html
where myfile.html is the name of the file you just saved and mynicefile.html is the name of the new file.
3. Use "File/Open File" to open the new file, which you can then print.
Example[edit]
These will take you to pdf files printed from FireFox showing the before and after versions of a file
Original file: File:Dewikify.OriginalFile.pdf
Modified file: File:Dewikify.NewFile.pdf
Original code: Philip Schrodt 10:49, 12 July 2011 (PDT)