Weekend at the Museum (of Brittany)

On the occasion of the European Heritage Days, the Museum of Brittany, located in Rennes, opened a new portal with all their collection. The museum has a long history of openness and partnerships with local wikimedians. For instance, images from the museum, some imported directly by the museum into Wikimedia Commons, are viewed more than 400,000 times each month on Wikipedia. A large part of the new website is under free licenses.

Captain Dreyfus leaves after a hearing during his trial in Rennes (photo by unknown author, public domain).

On the portal, each record can be located with an Archival Resource Key (ARK). To make it simpler, an ARK is an Uniform Resource Locator (URL) of the form ark:/<organization>/<object>. The organization part is a five-digit number identifying the organization naming the object, called Name Assigning Authority Number (NAAN). There is already a property on Wikidata for this identifier, P1870, but it was not much used.

As a text file from the California Digital Library and listing all NAANs exists (http://www.cdlib.org/uc3/naan_registry.txt), I wrote a PHP script to parse it and import it into Mix’n’Match, a tool that allows to match entries from catalogs to Wikidata items.

$data = file_get_contents('http://www.cdlib.org/uc3/naan_registry.txt');
preg_match_all('/naa:'."\n".'who:[ \t]*(.*?)'."\n".'what:[ \t]*([0-9]{5})/', $data, $matches);
for ($i = 0; $i < count($matches[0]); $i++) {
    $split = preg_split('/\(=\)/', $matches[1][$i]);
    if (count($split) == 2) {
        $label = $split[0];
    } elseif (count($split) == 3) {
        $label = $split[1];
    }
    echo $matches[2][$i]."\t".trim($label)."\t".trim($matches[1][$i])."\n";
}

The output of the script can be used to create a Mix’n’Match catalog, following this guide. I had to lure the import tool, as it requires an URL for each entry but, in our case, no entry has a specific one. The Mix’n’Match catalog is now available as the #555. For the moment, 237 of the 544 numbers were matched (43%). And the Museum of Brittany now has its NAAN on Wikidata.

Suitcase and its content (photo by unknown author, CC0).

A second task will be to link Wikidata items to the new portal of the Museum of Brittany. A dedicated property is already proposed, and will contains the object part of an ARK. A question raised is to maybe create a generic property for all ARKs in Wikidata, instead of creating a specific one for each organization. Last but not least, the Museum of Brittany should also provide a public API in a few weeks, easing the matching and importing into Wikidata, and the linking to other databases.

Many thanks to the Museum of Brittany, Nicolas Vigneron for his help this weekend, and Magnus Manske for his Wikidata tools.