Actions

Difference between revisions of "Remove Wikipedia references from HTML"

From OPOSSEM

m
m
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
==The Problem Being Solved==
 
==The Problem Being Solved==
  
Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFoxwhile leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)
+
Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from '''''FireFox''''' while leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)
  
 
==The Code==  
 
==The Code==  
Line 9: Line 9:
 
The following code will remove all of the Wikipedia references  
 
The following code will remove all of the Wikipedia references  
  
1. Use the "File/Save Page As..." to save the page to a file myfile.html
+
1. Use the "File/Save Page As..." to save the page to a file myfile.html. Note that FireFox will also create a separate folder with all of the ancillary files for the page—images and the like—which you just leave as-is.
  
 
2. In the Terminal, run
 
2. In the Terminal, run
Line 15: Line 15:
 
<nowiki>
 
<nowiki>
 
perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html</nowiki>
 
perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html</nowiki>
 +
 +
where ''myfile.html'' is the name of the file you just saved and ''mynicefile.html'' is the name of the new file.
  
 
3. Use "File/Open File" to open the new file, which you can then print.
 
3. Use "File/Open File" to open the new file, which you can then print.

Latest revision as of 09:27, 13 July 2011


The Problem Being Solved[edit]

Text that was originally sourced from Wikipedia will have a large number of references that detract from the flow of the text when it is printed. This "perl one-liner" removes the Wikipedia references from an HTML file downloaded from FireFox while leaving other references intact. It was tested on a Macintosh but the perl code is general and should also work on Windows (either as is or in some minor modification that it would be helpful for someone to test/contribute)

The Code[edit]

The following code will remove all of the Wikipedia references

1. Use the "File/Save Page As..." to save the page to a file myfile.html. Note that FireFox will also create a separate folder with all of the ancillary files for the page—images and the like—which you just leave as-is.

2. In the Terminal, run

perl -npe 's/<a href=\"http:\/\/en\.wikipedia\.org\/wiki.+?>(.+?)<\/a>/$1/g' myfile.html > mynicefile.html

where myfile.html is the name of the file you just saved and mynicefile.html is the name of the new file.

3. Use "File/Open File" to open the new file, which you can then print.

Example[edit]

These will take you to pdf files printed from FireFox showing the before and after versions of a file

Original file: File:Dewikify.OriginalFile.pdf

Modified file: File:Dewikify.NewFile.pdf

Original code: Philip Schrodt 10:49, 12 July 2011 (PDT)