2020 personal plans around Wikidata and Wikibase

After a look back at 2019, I will present my two main projects around Wikidata and Wikibase in 2020: the merge of tools providing statistics about the gender gap in the content of Wikimedia projects and the creation of a tool, based on Wikibase, about video games reviews.

2019

Here are the main events related to Wikidata and Wikibase that I attended in 2019:

Gender gap and Denelezh

Denelezh is a tool that provides statistics about the gender gap in the content of Wikimedia projects. Thanks to Sylvain Boissel, it is now hosted by Wikimédia France.

At Wikimania, Maximilian Klein, author of WHGI, and I met. We decided to merge our tools, as some of their features are overlapping. The idea is to create a new tool, keeping existing features, and building a new architecture to easily implement new ones, relying on our experience with WHGI and Denelezh.

You can track the progress in T230184.

Video games reviews and Wikibase

In my first years of contribution to Wikidata, I wanted to put every data I found in it, like Wikidata was a black hole. However, this is not a good idea, for several reasons. As a general knowledge base, Wikidata scalability is a concern, both technically (for instance, its Query Service struggles to keep updated) and on a human level (the more Wikidata grows, the more it needs volunteers to maintain its data, consistently with all topics covered by Wikidata). A solution is to rely on other knowledge bases, Wikidata being a hub between them. In brief, as I already stated, Wikidata is not the database to rule them all, but the database to link them all.

In 2019, I started to work on video games reviews. At the moment, I’m not convinced by the data model used in Wikidata and would like to try some new ones. It’s not really possible in Wikidata, as you can’t change the data model every fortnight just for testing purpose, as this would impact every consumer of this data (whether they are Wikimedia projects or projects outside the Wikimedia movement). Another problem is that, as Wikidata is a generalist database, you have to comply to rules from every sub-projects: you can’t just focus on a project (like video games) but have to cope with every related projects, which can be quickly cumbersome (for instance, creating someone’s family name is not straightforward).

For all of these reasons, I decided to create my own database. To this end, Wikibase, the Mediawiki extension that powers Wikidata, seemed to be the ideal candidate. Using your own Wikibase allows you to build your own data model, with the convenience that it is technically compatible with Wikidata. The use of Wikibase by Wikidata proves that it’s a robust tool and very efficient if your data set is not too big. Also, thanks to Wikidata, Wikibase has a large ecosystem, even if some tools still need to be generalized to be used with any Wikibase instance (examples: the merge feature, constraints reports, etc.). The one-day Wikibase workshop at WikidataCon finished to convince me: the easiness, using Docker, to install a Wikibase instance with a complete ecosystem running (including the Query Service and QuickStatements) was amazing. My project is now ongoing.

On the community side of Wikibase, I wrote the largest part of the 2019 report of the Wikibase Community User Group and was designated to represent the user group at the Wikimedia Summit 2020.


Photo by Anthere, CC BY-SA 4.0.