Actions

Difference between revisions of "Wikipedia link converter"

From OPOSSEM

 
(4 intermediate revisions by 2 users not shown)
Line 8: Line 8:
 
1. Find a computer that will run perl; I'm pretty sure this is installed by default on Macintosh/Linux computers and runs in the 'Terminal' program. Presumably there is also a way to do this in Windows as well.
 
1. Find a computer that will run perl; I'm pretty sure this is installed by default on Macintosh/Linux computers and runs in the 'Terminal' program. Presumably there is also a way to do this in Windows as well.
  
2. Display this page in ''Edit'' code; cut and paste the code at the end of the page into a file named 'OPSM.wikipedia.referencer.pl' (or whatever else you wish to name it)
+
2. Cut and paste the code at the end of the page into a file named 'OPSM.wikipedia.referencer.pl' (or whatever else you wish to name it)
  
 
== How to use the program ==
 
== How to use the program ==
Line 26: Line 26:
 
1. This seems to get almost everything correct but it would be prudent to check the links and make any appropriate manual corrections.
 
1. This seems to get almost everything correct but it would be prudent to check the links and make any appropriate manual corrections.
  
2. There appear to be some extensions in MediaWiki that will also do this, though I couldn't get them to work on the OPOSSEM site.
+
2. There appear to be some extensions in MediaWiki that will also do this -- for example [http://www.mediawiki.org/wiki/Extension:ReplaceRedLinks Extension:ReplaceRedLinks] -- though I couldn't get them to work on the OPOSSEM site.
  
 
3. It also appears that the construct <nowiki>[[w:...]]</nowiki> will do the same thing and might be easier; modification is left as an exercise
 
3. It also appears that the construct <nowiki>[[w:...]]</nowiki> will do the same thing and might be easier; modification is left as an exercise
Line 32: Line 32:
 
=== The Program ===
 
=== The Program ===
  
The following mess converts to a nicely formatted and commented perl program once you look at it in 'Edit' mode.
+
<nowiki>
 +
##  OPSM.wikipedia.referencer.pl
 +
##
 +
##  This Perl program converts the internal [[...]] references in a Wikipedia page to external
 +
##  references [http://en.wikipedia.org/wiki/...] in the MediaWiki format. This program was
 +
##  created for the OPOSSEM project, http://opossem.org
 +
##
 +
##  USAGE NOTES
 +
##
 +
##  1. Copy the text from the *edit* version of the Wikipedia page (click on the 'Edit' tab
 +
##    on the upper right margin of the page; this will give a view that is similar to what
 +
##    you see editing an OPOSSEM page)
 +
##
 +
##  2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt
 +
##
 +
##  3. Run the program using perl (locating and learning the use of perl is left as an exercise:
 +
##    consult a local computer geek)
 +
##
 +
##  4. Open the new output file, which is the original file name plus the suffix ".convert"
 +
##
 +
##  5. Copy and paste that code into the OPOSSEN page.
 +
##
 +
##  ALSO NOTE
 +
##
 +
##  1. This seems to get almost everything correct but it would be prudent to check the links and
 +
##    make any appropriate manual corrections.
 +
##
 +
##  2. There appear to be some extensions in MediaWiki that will also do this, though I couldn't
 +
##    get them to work on the OPOSSEM site.
  
<nowiki>
+
##   
##  OPSM.wikipedia.referencer.pl
+
##  TO RUN PROGRAM:
##
+
##
##  This Perl program converts the internal [[...]] references in a Wikipedia page to external
+
##  perl OPSM.wikipedia.referencer.pl filename
##  references [http://en.wikipedia.org/wiki/...] in the MediaWiki format. This program was
+
##
##  created for the OPOSSEM project, http://opossem.org
+
##  where filename is the name of file to be converted. Output will be in the file
##
+
##  filename.convert  
##  USAGE NOTES
+
##
##
+
##  PROGRAMMING NOTES:
##  1. Copy the text from the *edit* version of the Wikipedia page (click on the 'Edit' tab
+
##
##    on the upper right margin of the page; this will give a view that is similar to what
+
##  None
##    you see editing an OPOSSEM page)
+
##
##
+
##  SYSTEM REQUIREMENTS
##  2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt
+
##  This program has been successfully run under Mac OS 10.5; it is standard perl
##
+
##  so it should also run in Unix or Windows.  
##  3. Run the program using perl (locating and learning the use of perl is left as an exercise:
+
##
##    consult a local computer geek)
+
##  PROVENANCE:
##
+
##  Programmer: Philip A. Schrodt
##  4. Open the new output file, which is the original file name plus the suffix ".convert"
+
##              Dept of Political Science
##
+
##              Pennsylvania State University
##  5. Copy and paste that code into the OPOSSEN page.
+
##              227 Pond Laboratory
##
+
##             University Park, PA, 16802 U.S.A.
##  ALSO NOTE
+
##             http://eventdata.psu.edu
##
+
##
##  1. This seems to get almost everything correct but it would be prudent to check the links and
+
## Redistribution and use in source and binary forms, with or without modification,
##    make any appropriate manual corrections.
+
## are permitted under the terms of the GNU General Public License:
##
+
## http://www.opensource.org/licenses/gpl-license.html
##  2. There appear to be some extensions in MediaWiki that will also do this, though I couldn't
+
##
##    get them to work on the OPOSSEM site.
+
## Report bugs to: schrodt@psu.edu
+
##
##   
+
## For plausible indenting of this source code, set the tab size in your editor to "2"
##  TO RUN PROGRAM:
+
##
##
+
##  REVISION HISTORY:
##  perl OPSM.wikipedia.referencer.pl filename
+
##  07-Jul-11:  Initial version
##
+
##
##  where filename is the name of file to be converted. Output will be in the file
+
##  ----------------------------------------------------------------------------------
##  filename.convert  
+
 
##
+
#!/usr/local/bin/perl
##  PROGRAMMING NOTES:
+
 
##
+
# ======== main program =========== #
##  None
+
 
##
+
if (length($ARGV[0]) < 1) {  # read the file name from the command line
##  SYSTEM REQUIREMENTS
+
print "file name is required to run the program\n";
##  This program has been successfully run under Mac OS 10.5; it is standard perl
+
exit;
##  so it should also run in Unix or Windows.  
+
}
##
+
else { $filename = $ARGV[0]; }
##  PROVENANCE:
+
 
##  Programmer: Philip A. Schrodt
+
open(FIN,$filename)  or die "Can\'t open input file; error $!";
##              Dept of Political Science
+
$outfile = ">$filename".".convert";
##              Pennsylvania State University
+
open(FOUT,$outfile) or die "Can\'t open output file ; error $!";
##              227 Pond Laboratory
+
 
##             University Park, PA, 16802 U.S.A.
+
while ($line = <FIN>) { # read through the file
##             http://eventdata.psu.edu
+
$restline = $line;
##
+
$newline = "";
## Redistribution and use in source and binary forms, with or without modification,
+
$offset = 0;
## are permitted under the terms of the GNU General Public License:
+
while ($restline =~ m/\[\[/cg) { # find any instances of '[['
## http://www.opensource.org/licenses/gpl-license.html
+
$newline .= $`."[http://en.wikipedia.org/wiki/"; # add the wikipedia link
##
+
$restline = $';
## Report bugs to: schrodt@psu.edu
+
$restline =~ m/\]\]/; # skip to the end of the tag
##
+
$restline = $';  # save the remainder of the sentence
## For plausible indenting of this source code, set the tab size in your editor to "2"
+
$target = $`;    # save the tag contents
##
+
if ($target =~m/\|/) {  # explicit alternative text
##  REVISION HISTORY:
+
$tlink = $`;
##  07-Jul-11:  Initial version
+
$target = $';
##
+
print "| ",$tlink, "  ",$target,"\n";
##  ----------------------------------------------------------------------------------
+
} else { # use reference as alternative text
+
$tlink = $target;
#!/usr/local/bin/perl
+
print "- ",$tlink, "  ",$target,"\n";
 
# ======== main program =========== #
 
 
if (length($ARGV[0]) < 1) {  # read the file name from the command line
 
print "file name is required to run the program\n";
 
exit;
 
}
 
else { $filename = $ARGV[0]; }
 
 
open(FIN,$filename)  or die "Can\'t open input file; error $!";
 
$outfile = ">$filename".".convert";
 
open(FOUT,$outfile) or die "Can\'t open output file ; error $!";
 
 
while ($line = <FIN>) { # read through the file
 
$restline = $line;
 
$newline = "";
 
$offset = 0;
 
while ($restline =~ m/\[\[/cg) { # find any instances of '[['
 
$newline .= $`."[http://en.wikipedia.org/wiki/"; # add the wikipedia link
 
$restline = $';
 
$restline =~ m/\]\]/; # skip to the end of the tag
 
$restline = $';  # save the remainder of the sentence
 
$target = $`;    # save the tag contents
 
if ($target =~m/\|/) {  # explicit alternative text
 
$tlink = $`;
 
$target = $';
 
print "| ",$tlink, "  ",$target,"\n";
 
} else { # use reference as alternative text
 
$tlink = $target;
 
print "- ",$tlink, "  ",$target,"\n";
 
}
 
$tlink =~ tr/ /_/; # replace spaces with underscores in the link
 
$newline .= $tlink . " " . $target . "]";
 
 
}
 
}
print FOUT $newline, $restline;
+
$tlink =~ tr/ /_/; # replace spaces with underscores in the link
 +
$newline .= $tlink . " " . $target . "]";
 
}
 
}
close(FOUT) or die "Can\'t close output file ; error $!";
+
print FOUT $newline, $restline;
close(FIN) or die "Can\'t close input file ; error $!";
+
}
print "Program has finished!\n";
+
close(FOUT) or die "Can\'t close output file ; error $!";
 +
close(FIN) or die "Can\'t close input file ; error $!";
 +
print "Program has finished!\n";
 
</nowiki>
 
</nowiki>

Latest revision as of 11:27, 8 July 2011

Wikpedia reference converter[edit]

This Perl program converts the internal [[...]] references in a Wikipedia page to external references [http://en.wikipedia.org/wiki/...] in the MediaWiki format.

Getting the program[edit]

1. Find a computer that will run perl; I'm pretty sure this is installed by default on Macintosh/Linux computers and runs in the 'Terminal' program. Presumably there is also a way to do this in Windows as well.

2. Cut and paste the code at the end of the page into a file named 'OPSM.wikipedia.referencer.pl' (or whatever else you wish to name it)

How to use the program[edit]

1. Copy the text from the edit (!) version of the Wikipedia page (click on the 'Edit' tab on the upper right margin of the page; this will give a view that is similar to what you see editing an OPOSSEM page)

2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt

3. Run the program using perl (locating and learning the use of perl is left as an exercise consult a local computer geek)

4. Open the new output file, which is the original file name plus the suffix ".convert"

5. Copy and paste that code into the OPOSSEM page.

Other Notes[edit]

1. This seems to get almost everything correct but it would be prudent to check the links and make any appropriate manual corrections.

2. There appear to be some extensions in MediaWiki that will also do this -- for example Extension:ReplaceRedLinks -- though I couldn't get them to work on the OPOSSEM site.

3. It also appears that the construct [[w:...]] will do the same thing and might be easier; modification is left as an exercise

The Program[edit]

##  OPSM.wikipedia.referencer.pl
##
##  This Perl program converts the internal [[...]] references in a Wikipedia page to external
##  references [http://en.wikipedia.org/wiki/...] in the MediaWiki format. This program was 
##  created for the OPOSSEM project, http://opossem.org
##
##  USAGE NOTES
##
##  1. Copy the text from the *edit* version of the Wikipedia page (click on the 'Edit' tab
##     on the upper right margin of the page; this will give a view that is similar to what
##     you see editing an OPOSSEM page)
##
##  2. Paste into a text editor and save as a .txt file, e.g. wikicopy.txt
##
##  3. Run the program using perl (locating and learning the use of perl is left as an exercise:
##    consult a local computer geek)
##
##  4. Open the new output file, which is the original file name plus the suffix ".convert"
##
##  5. Copy and paste that code into the OPOSSEN page.
##
##  ALSO NOTE
##
##  1. This seems to get almost everything correct but it would be prudent to check the links and
##     make any appropriate manual corrections.
##
##  2. There appear to be some extensions in MediaWiki that will also do this, though I couldn't
##     get them to work on the OPOSSEM site.

##   
##  TO RUN PROGRAM:
##
##  perl OPSM.wikipedia.referencer.pl filename
##
##  where filename is the name of file to be converted. Output will be in the file
##  filename.convert 
##
##  PROGRAMMING NOTES:
##
##  None
##
##  SYSTEM REQUIREMENTS
##  This program has been successfully run under Mac OS 10.5; it is standard perl
##  so it should also run in Unix or Windows. 
##
##  PROVENANCE:
##  Programmer: Philip A. Schrodt
##              Dept of Political Science
##              Pennsylvania State University
##              227 Pond Laboratory
##	            University Park, PA, 16802 U.S.A.
##	            http://eventdata.psu.edu
##
## 	Redistribution and use in source and binary forms, with or without modification,
## 	are permitted under the terms of the GNU General Public License:
## 	http://www.opensource.org/licenses/gpl-license.html
##
##	Report bugs to: schrodt@psu.edu
##
##	For plausible indenting of this source code, set the tab size in your editor to "2"
##
##  REVISION HISTORY:
##  07-Jul-11:  Initial version
##
##  ----------------------------------------------------------------------------------

#!/usr/local/bin/perl

# ======== main program =========== #

if (length($ARGV[0]) < 1) {  # read the file name from the command line
	print "file name is required to run the program\n";
	exit;
}
else { $filename = $ARGV[0]; }

open(FIN,$filename)  or die "Can\'t open input file; error $!";
$outfile = ">$filename".".convert";
open(FOUT,$outfile) or die "Can\'t open output file ; error $!";

while ($line = <FIN>) { # read through the file
	$restline = $line;
	$newline = "";
	$offset = 0;
	while ($restline =~ m/\[\[/cg) { # find any instances of '[['
		$newline .= $`."[http://en.wikipedia.org/wiki/"; # add the wikipedia link
		$restline = $';
		$restline =~ m/\]\]/; # skip to the end of the tag
		$restline = $';  # save the remainder of the sentence
		$target = $`;    # save the tag contents
		if ($target =~m/\|/) {  # explicit alternative text
			$tlink = $`;
			$target = $';
			print "| ",$tlink, "  ",$target,"\n";
		} else { # use reference as alternative text
			$tlink = $target;
			print "- ",$tlink, "  ",$target,"\n";
		}
		$tlink =~ tr/ /_/; # replace spaces with underscores in the link
		$newline .= $tlink . " " . $target . "]";
	}
	print FOUT $newline, $restline;
}
close(FOUT) or die "Can\'t close output file ; error $!";
close(FIN) or die "Can\'t close input file ; error $!";
print "Program has finished!\n";