Actions

Difference between revisions of "Wikipedia link converter"

From OPOSSEM

(The Program)
Line 34: Line 34:
 
The following mess converts to a nicely formatted and commented perl program once you look at it in 'Edit' mode.
 
The following mess converts to a nicely formatted and commented perl program once you look at it in 'Edit' mode.
  
<nowiki>
+
<nowiki>
 
##  OPSM.wikipedia.referencer.pl
 
##  OPSM.wikipedia.referencer.pl
 
##
 
##

Revision as of 10:04, 8 July 2011

Wikpedia reference converter

This Perl program converts the internal [[...]] references in a Wikipedia page to external references [http://en.wikipedia.org/wiki/...] in the MediaWiki format.

Getting the program

1. Find a computer that will run perl; I'm pretty sure this is installed by default on Macintosh/Linux computers and runs in the 'Terminal' program. Presumably there is also a way to do this in Windows as well.

2. Display this page in Edit code; cut and paste the code at the end of the page into a file named 'OPSM.wikipedia.referencer.pl' (or whatever else you wish to name it)

How to use the program

1. Copy the text from the edit (!) version of the Wikipedia page (click on the 'Edit' tab on the upper right margin of the page; this will give a view that is similar to what you see editing an OPOSSEM page)

2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt

3. Run the program using perl (locating and learning the use of perl is left as an exercise consult a local computer geek)

4. Open the new output file, which is the original file name plus the suffix ".convert"

5. Copy and paste that code into the OPOSSEM page.

Other Notes

1. This seems to get almost everything correct but it would be prudent to check the links and make any appropriate manual corrections.

2. There appear to be some extensions in MediaWiki that will also do this -- for example Extension:ReplaceRedLinks -- though I couldn't get them to work on the OPOSSEM site.

3. It also appears that the construct [[w:...]] will do the same thing and might be easier; modification is left as an exercise

The Program

The following mess converts to a nicely formatted and commented perl program once you look at it in 'Edit' mode.

	##  OPSM.wikipedia.referencer.pl
	##
	##  This Perl program converts the internal [[...]] references in a Wikipedia page to external
	##  references [http://en.wikipedia.org/wiki/...] in the MediaWiki format. This program was 
	##  created for the OPOSSEM project, http://opossem.org
	##
	##  USAGE NOTES
	##
	##  1. Copy the text from the *edit* version of the Wikipedia page (click on the 'Edit' tab
	##     on the upper right margin of the page; this will give a view that is similar to what
	##     you see editing an OPOSSEM page)
	##
	##  2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt
	##
	##  3. Run the program using perl (locating and learning the use of perl is left as an exercise:
	##    consult a local computer geek)
	##
	##  4. Open the new output file, which is the original file name plus the suffix ".convert"
	##
	##  5. Copy and paste that code into the OPOSSEN page.
	##
	##  ALSO NOTE
	##
	##  1. This seems to get almost everything correct but it would be prudent to check the links and
	##     make any appropriate manual corrections.
	##
	##  2. There appear to be some extensions in MediaWiki that will also do this, though I couldn't
	##     get them to work on the OPOSSEM site.
	
	##   
	##  TO RUN PROGRAM:
	##
	##  perl OPSM.wikipedia.referencer.pl filename
	##
	##  where filename is the name of file to be converted. Output will be in the file
	##  filename.convert 
	##
	##  PROGRAMMING NOTES:
	##
	##  None
	##
	##  SYSTEM REQUIREMENTS
	##  This program has been successfully run under Mac OS 10.5; it is standard perl
	##  so it should also run in Unix or Windows. 
	##
	##  PROVENANCE:
	##  Programmer: Philip A. Schrodt
	##              Dept of Political Science
	##              Pennsylvania State University
	##              227 Pond Laboratory
	##	            University Park, PA, 16802 U.S.A.
	##	            http://eventdata.psu.edu
	##
	## 	Redistribution and use in source and binary forms, with or without modification,
	## 	are permitted under the terms of the GNU General Public License:
	## 	http://www.opensource.org/licenses/gpl-license.html
	##
	##	Report bugs to: schrodt@psu.edu
	##
	##	For plausible indenting of this source code, set the tab size in your editor to "2"
	##
	##  REVISION HISTORY:
	##  07-Jul-11:  Initial version
	##
	##  ----------------------------------------------------------------------------------
	
	#!/usr/local/bin/perl
	
	# ======== main program =========== #
	
	if (length($ARGV[0]) < 1) {  # read the file name from the command line
		print "file name is required to run the program\n";
		exit;
	}
	else { $filename = $ARGV[0]; }
	
	open(FIN,$filename)  or die "Can\'t open input file; error $!";
	$outfile = ">$filename".".convert";
	open(FOUT,$outfile) or die "Can\'t open output file ; error $!";
	
	while ($line = <FIN>) { # read through the file
		$restline = $line;
		$newline = "";
		$offset = 0;
		while ($restline =~ m/\[\[/cg) { # find any instances of '[['
			$newline .= $`."[http://en.wikipedia.org/wiki/"; # add the wikipedia link
			$restline = $';
			$restline =~ m/\]\]/; # skip to the end of the tag
			$restline = $';  # save the remainder of the sentence
			$target = $`;    # save the tag contents
			if ($target =~m/\|/) {  # explicit alternative text
				$tlink = $`;
				$target = $';
				print "| ",$tlink, "  ",$target,"\n";
			} else { # use reference as alternative text
				$tlink = $target;
				print "- ",$tlink, "  ",$target,"\n";
			}
			$tlink =~ tr/ /_/; # replace spaces with underscores in the link
			$newline .= $tlink . " " . $target . "]";
		}
		print FOUT $newline, $restline;
	}
	close(FOUT) or die "Can\'t close output file ; error $!";
	close(FIN) or die "Can\'t close input file ; error $!";
	print "Program has finished!\n";