Solving Project Euler problems with SPARQL and Wikidata
Founded in 2001, Project Euler is a website offering a series of more than 700 mathematical problems that you can solve with the programming language of your choice. Unsurprisingly, Python, C/C++, and Java are the most popular languages used by visitors of the website. I wanted to know if it was possible to use Wikidata and its SPARQL endpoint to solve these problems.
Spoiler: this post gives the solution of the first problem and discusses how to solve the second. Don’t hesitate to first go to Project Euler and to find the solutions by yourself.
Problem 1
The objective of the first problem is to find the sum of all the multiples of 3 or 5 below 1,000.
Let’s find all natural numbers in Wikidata, with their numerical values:
SELECT ?item ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . } ORDER BY ?value
As you can see, Wikidata has more than 10,000 items about natural numbers. Some values are duplicated, like Q199 (the number 1) and Q713113 (the number 0.999… which is equal to 1).
Let’s remove the duplicates and filter the numbers to keep only the ones below 1,000:
SELECT DISTINCT ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000) . } ORDER BY ?value
There are exactly 999 numbers between 1 and 999, so none is missing.
Let’s keep only the numbers that are the multiples of 3 or 5, by checking that their division by 3 or 5 has no remainder:
SELECT DISTINCT ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) . } ORDER BY ?value
Here are the results.
Now, we just have to sum the numbers:
SELECT (SUM(DISTINCT ?value) AS ?sum) { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) . }
And Wikidata gives us the solution of the problem!
Problem 2
The objective of the second problem is to find the sum of even Fibonacci numbers below 4,000,000.
Let’s find Fibonacci numbers available in Wikidata:
SELECT ?value { ?item wdt:P31 wd:Q47577 ; wdt:P1181 ?value . } ORDER BY ?value
Only the twenty first ones, below 10,000, are known in Wikidata. As a general-purpose database, Wikidata is certainly not made to store every existing numbers under 4,000,000. It makes the method previously used in the first problem not applicable here.
At the moment, I don’t see any simple solution to solve this second problem using Wikidata’s SPARQL endpoint, and probably neither to the following ones. Maybe you can do it?
For your information, more complex problems, not related to Project Euler, have been solved using SPARQL, outside of Wikidata’s endpoint, like Sudoku (for instance here or there).
A short history of Denelezh
A few weeks ago, Gnom was wondering what is the actual size of the gender gap:
Do we have an estimate of the size of the gender gap on Wikipedia given the current notability criteria? In other words, if every notable living or dead person had an article, what percentage would be about women? For example, would 30% women biographies be a good ballpark for a ‘closed’ Wikipedia gender gap?
This was my answer, giving some insights about the history of Denelezh, a tool that provides statistics about the gender gap in the content of Wikimedia projects:
Your question is the one I had in mind when I built the first version of Denelezh. My idea was to split the problem into smaller ones and to work by subsets to facilitate the study. There are subsets where the percentage of women is perfectly known. For instance, here are the percentages of women in the lower house of the French Parliament since 1945. In the Wikipedia in French language, members of parliaments automatically comply with notability criteria. Other subsets have to be studied. The tool partially allows to do so by providing statistics about humans depicted in Wikidata along various dimensions that you can combine (for instance country of citizenship + occupation = French politicians). But merging these subsets is not easy, as they are overlapping: you can’t simply add the percentages from two sets when some people belong to both. An athlete can also be a politician, someone can have several countries of citizenship, and so long…
The first version of the tool provided statistics about humans in Wikidata with a gender, a year of birth, and a country of citizenship. The assumption (unverified) was that Wikidata items with all these properties were of better quality than the ones with one or more missing properties. The problem is that statistics were only about 50% of humans depicted in Wikidata, and thus were misleading for people studying the gender gap in Wikimedia projects.
The second version, which development was rushed, solved this problem by no longer filtering Wikidata items on the number of properties they have and providing statistics as long as the data was available. With this change (and the addition of Wikimedia projects as an available dimension to filter / combine), it became closer to WHGI. One current problem is that the tool lack statistics on Wikidata quality (for instance, how many Wikidata items depicting humans have the property gender?).
The third version will be a merge of Denelezh and WHGI. Some ideas are already in the pipeline (adding external IDs as a dimension, producing lists of notable people to help Wikimedia editors to find subjects to work on, etc.), some others are on Phabricator. Feedback welcome 🙂
Wikibase Yearly Summary 2020
This timeline tracks what happened around Wikibase in 2020 and is frequently updated. Items are listed using the following categories:
Articles & News
Documentation
Talks
Meetings
Staffing
Technical
Please contact me if something is missing!
January
- 2020-01-22
Wikidata & Wikibase office hour by Wikimedia Deutschland [notes].
- 2020-01-22
Samantha Alipio becomes Product Manager for Wikibase at Wikimedia Deutschland.
- 2020-01-22
2020 development roadmap for Wikidata and Wikibase by Wikimedia Deutschland.
- 2020-01-23
Change: Wikibase installations: new term store enabled.
- 2020-01-25
WBStack Infrastructure, by Addshore.
February
- 2020-02-01
Wikibase Ecosystem, taking Wikidata further, by Lydia Pintscher (WMDE) at FOSDEM [slides, video].
- 2020-02-05
Change: Action API parameter validation.
- 2020-02-06
ADR-7: Use vuex-smart-modules in Data Bridge.
- 2020-02-17
Wikidata is just a matter of facts, by Andra Waagmeester.
- 2020-02-17
ADR-8: Format Bridge references via API.
- 2020-02-20
First online meeting of the Wikibase Community User Group [notes].
March
- 2020-03-04
Could you wikify an authority file? Wikibase has been evaluated for the Integrated Authority File (GND), by Barbara Fischer (DNB) and Jens Ohlig (WMDE).
- 2020-03-04
ADR-9: Refactor hooks for testability.
April
- 2020-04-01 → 2020-04-03 — EMWCon Spring 2020
Wikibase for Smart Cities, by Houcemeddine Turki, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha [slides, video].
Wikibase, Docker & WBStack, by Adam Shorland [video].
- 2020-04-07
Wikidata & Wikibase office hour by Wikimedia Deutschland [notes].
- 2020-04-07
ADR-10: Do not add source information to Property IDs for the Federated Properties MVP.
- 2020-04-07
[es] First steps with Wikibase, by BVMC.
- 2020-04-16
Change: Blank node deprecation in WDQS & Wikibase RDF model.
- 2020-04-19
WBStack 2020 Update 1, by Addshore.
- 2020-04-21
Mohammed Sadat becomes Community Communications Manager for Wikidata and Wikibase at Wikimedia Deutschland.
- 2020-04-23
ADR-11: declare strict_types Rollout.
May
- 2020-05-05
Wikidata Course at TAU, Part 2/4: Wikibase & Lexemes, by Shani Evenstein with Lydia Pintscher (WMDE).
- 2020-05-17
WBStack 2020 Update 2, by Addshore.
- 2020-05-19
Dan Shick becomes Technical Writer for Wikibase/Wikidata at Wikimedia Deutschland.
- 2020-05-20
Building an OpenRefine Reconciliation Endpoint for a Wikibase project: Lessons Learned, by Jeff Mixter and Bruce Washburn (OCLC).
- 2020-05-30
[fr] Wikibase exploration, huge technical experience feedback on Wikibase, related to FNE proof of concept.
June
- 2020-06-10
Cataloguing Practices in the Age of Linked Open Data: Wikidata and Wikibase for Film Archives, by Adelheid Heftberger and Paul Duchesne (FIAF).
- 2020-06-15
ADR-12: PSR-4 and maintenance scripts [T172368].
- 2020-06-18
Experimentations with Wikidata/Wikibase, by Karen Smith-Yoshimura (OCLC).
- 2020-06-26
ADR-13: Register shared features in Repo and Client.
July
- 2020-07-02
Second online meeting of the Wikibase Community User Group [notes].
- 2020-07-02
Version 1.1 of wikiba.se is released.
- 2020-07-09
Linked data for libraries with Wikibase, by Jens Ohlig (WMDE) at LD4 [video].
- 2020-07-13
[fr] FNE proof of concept: synthesis and decisions [it’s a go!], by BnF and ABES [report].
- 2020-07-15
Change: removing special pageterms behavior on repo wikis, use entityterms instead.
- 2020-07-16
[fr] FNE project: Wikibase proof of concept completed, by ABES.
- 2020-07-20
Change: WikibaseClient example config will no longer define an example repo.
- 2020-07-21
Wikidata & Wikibase office hour by Wikimedia Deutschland [notes].
August
- 2020-08-05
ADR-14: Make Wikibase.git a monorepo.
- 2020-08-06
Third online meeting of the Wikibase Community User Group [notes, video].
- 2020-08-17
Change: New datatype fields in Lexeme JSON output.
- 2020-08-19
Semantic MediaWiki vs Wikibase vs Cargo, by Jeroen De Dauw (Professional.Wiki).
September
- 2020-09-03 — SSHOC Online Information Sessions: Open-Source Vocabulary Hosting and Management Platforms — Introduction to Wikibase Vocabulary Platform
Introduction to Wikibase, by Georgina Burnett and Jens Ohlig (WMDE) [slides].
Use case: The German National Library’s work on the Integrated Authority File, by Barbara Fischer and Sarah Hartmann (DNB) [slides].
- 2020-09-03
Fourth online meeting of the Wikibase Community User Group [notes].
- 2020-09-03
Semantic Wikibase Released, by Jeroen De Dauw (Professional.Wiki) [repository, video].
- 2020-09-16
Change: lib/WikibaseLib.php is no longer an entry point to the Lib extension.
- 2020-09-17
Wikidata, Wikibase and the library linked data ecosystem: an OCLC Research Library Partnership discussion, by Merrilee Proffitt (OCLC).
- 2020-09-24
An Introduction to Wikibase and Wikidata with Barbara Fischer and Sarah Hartmann, part of the Wikicite Discussion Series made by IFLA Wikidata Working Group.
October
- 2020-10-01
Fifth online meeting of the Wikibase Community User Group [notes].
- 2020-10-05
Local Media Support for Wikibase, by Jeroen De Dauw (Professional.Wiki) [video, repository, T258932].
- 2020-10-06
Jens Ohlig, Software Communication Strategist, announces leaving Wikimedia Deutschland.
- 2020-10-07
Wikibase Consultants and Support Providers (see also MediaWiki Professional development and consulting).
- 2020-10-08
IFLA Wikidata Working Group kick-off meeting.
- 2020-10-15 — SEMIC 2020 (full programme) — Providing sustainable data services through Wikibase and Wikidata [video]
Encoding, sharing and promoting Cohesion Policy projects as Linked Data with Wikibase, by Francesco Foglia (UNIRC), Max De Wilde (European Commission DG CNECT).
Decentralising the management of national authority files through Wikidata, by Anila Angjeli (BnF), Benjamin Bober (ABES), Barbara Fischer (DNB), Joel Thill (National Archives of Luxembourg), Christel Kayser (BnL).
- 2020-10-25
[it] Wikibase, starting from scratch, by Luca Mauri at itWikiCon 2020 [slides, notes].
- 2020-10-26
ADR-15: Use PSR-11-compatible MediaWiki service container as extension basis.
- 2020-10-27
Wikidata & Wikibase office hour by Wikimedia Deutschland [notes].
- 2020-10-28
An informal conversation about the emerging Wikibase community during 24-hours meetup of Wikidata’s Eighth Birthday, by Mohammed Sadat and Jens Ohlig (WMDE).
November
- 2020-11-02
Automatic synchronization of RDF graphs representing ontologies and Wikibase instances at Wikidata Workshop, by Alejandro González Hevia, Guillermo Facundo Colunga, Emilio Rubiera Azcona and Jose Emilio Labra Gayo (University of Oviedo) [paper].
- 2020-11-12
Sixth online meeting of the Wikibase Community User Group.
- 2020-11-17
The WikiLibrary Manifesto.
- 2020-11-17
Deutsche Nationalbibliothek und Wikimedia Deutschland veröffentlichen WikiLibrary-Manifest.
- 2020-11-23 → 2020-11-27 — Semantic Web in Libraries (SWIB20)
Using IIIF and Wikibase to syndicate and share cultural heritage material on the Web, by Jeff Keith Mixter (OCLC) [video].
- 2020-11-24 → 2020-11-26 — Semantic MediaWiki Conference Fall 2020 (SMWCon Fall 2020)
What’s new in the world of Wikibase and where is it going?, by Lydia Pintscher and Jens Ohlig (WMDE) [video].
SMW and WikiBase: blasphemy or feasible?, by Luca Mauri [slides, video].
Connecting Wikibase and Semantic MediaWiki, by Jeroen De Dauw (Professional.Wiki) [video].
December
- 2020-12-13
Open Sourcing WBStack, by Addshore.
- 2020-12-17
Seventh online meeting of the Wikibase Community User Group.
Icons by FamFamFam, CC BY 2.5.