Actions

Difference between revisions of "Remove Wikipedia references from HTML"

From OPOSSEM

(Created page with "<!-- add any hidden notes here --> The following code will remove all of the Wikipedia references from an HTML file downloaded from FireFox 1. Use the "File/Save Page As..." to...")
 
Line 1: Line 1:
 
<!-- add any hidden notes here -->
 
<!-- add any hidden notes here -->
  
The following code will remove all of the Wikipedia references from an HTML file downloaded from FireFox
+
==The Problem Being Solved==
 +
 
 +
Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFoxwhile leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)
 +
 
 +
==The Code==
 +
 
 +
The following code will remove all of the Wikipedia references
  
 
1. Use the "File/Save Page As..." to save the page to a file myfile.html
 
1. Use the "File/Save Page As..." to save the page to a file myfile.html
Line 11: Line 17:
  
 
3. Use "File/Open File" to open the new file, which you can then print.
 
3. Use "File/Open File" to open the new file, which you can then print.
 +
 +
==Example==
 +
 +
These are pdf files printed from FireFox showing the before and after versions of a file
 +
 +
[[File:Dewikify.OriginalFile.pdf Original file]]
 +
 +
 +
[[File:Dewikify.NewFile.pdf Modified file]]
 +
 +
Original code: [[User:Philip Schrodt|Philip Schrodt]] 10:49, 12 July 2011 (PDT)

Revision as of 09:49, 12 July 2011


The Problem Being Solved

Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFoxwhile leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)

The Code

The following code will remove all of the Wikipedia references

1. Use the "File/Save Page As..." to save the page to a file myfile.html

2. In the Terminal, run

perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html

3. Use "File/Open File" to open the new file, which you can then print.

Example

These are pdf files printed from FireFox showing the before and after versions of a file

File:Dewikify.OriginalFile.pdf Original file


File:Dewikify.NewFile.pdf Modified file

Original code: Philip Schrodt 10:49, 12 July 2011 (PDT)