ABOUT

This object is used to access content of the \s-1SQL\s0 based category dump files by providing an iterative interface for extracting the indidivual article links to the same. Objects returned are an instance of Parse::MediaWikiDump::link.

SYNOPSIS

  $pmwd = Parse::MediaWikiDump->new;
  $links = $pmwd->links('pagelinks.sql');
  $links = $pmwd->links(\*FILEHANDLE);

  #print the links between articles
  while(defined($link = $links->next)) {
    print 'from ', $link->from, ' to ', $link->namespace, ':', $link->to, "\n";
  }

STATUS

This software is being \s-1RETIRED\s0 - MediaWiki::DumpFile is the official successor to Parse::MediaWikiDump and includes a compatibility library called MediaWiki::DumpFile::Compat that is 100% \s-1API\s0 compatible and is a near perfect standin for this module. It is faster in all instances where it counts and is actively maintained. Any undocumented deviation of MediaWiki::DumpFile::Compat from Parse::MediaWikiDump is considered a bug and will be fixed.

METHODS

Parse::MediaWikiDump::Links->new

Create a new instance of a page links dump file parser Return the next available Parse::MediaWikiDump::link object or undef if there is no more data left

EXAMPLE

List all links between articles in a friendly way

#!/usr/bin/perl

use strict; use warnings;

use Parse::MediaWikiDump;

my $pmwd = Parse::MediaWikiDump->new; my $links = $pmwd->links(shift) or die "must specify a pagelinks dump file"; my $dump = $pmwd->pages(shift) or die "must specify an article dump file"; my %id_to_namespace; my %id_to_pagename;

binmode(STDOUT, ':utf8');

#build a map between namespace ids to namespace names foreach (@{$dump->namespaces}) { my $id = $_->[0]; my $name = $_->[1];

$id_to_namespace{$id} = $name; }

#build a map between article ids and article titles while(my $page = $dump->next) { my $id = $page->id; my $title = $page->title;

$id_to_pagename{$id} = $title; }

$dump = undef; #cleanup since we don't need it anymore

while(my $link = $links->next) { my $namespace = $link->namespace; my $from = $link->from; my $to = $link->to; my $namespace_name = $id_to_namespace{$namespace}; my $fully_qualified; my $from_name = $id_to_pagename{$from};

if ($namespace_name eq '') { #default namespace $fully_qualified = $to; } else { $fully_qualified = "$namespace_name:$to"; }

print "Article \"$from_name\" links to \"$fully_qualified\"\n"; }