Solving Project Euler problems with SPARQL and Wikidata
Founded in 2001, Project Euler is a website offering a series of more than 700 mathematical problems that you can solve with the programming language of your choice. Unsurprisingly, Python, C/C++, and Java are the most popular languages used by visitors of the website. I wanted to know if it was possible to use Wikidata and its SPARQL endpoint to solve these problems.
Spoiler: this post gives the solution of the first problem and discusses how to solve the second. Don’t hesitate to first go to Project Euler and to find the solutions by yourself.
Problem 1
The objective of the first problem is to find the sum of all the multiples of 3 or 5 below 1,000.
Let’s find all natural numbers in Wikidata, with their numerical values:
SELECT ?item ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . } ORDER BY ?value
As you can see, Wikidata has more than 10,000 items about natural numbers. Some values are duplicated, like Q199 (the number 1) and Q713113 (the number 0.999… which is equal to 1).
Let’s remove the duplicates and filter the numbers to keep only the ones below 1,000:
SELECT DISTINCT ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000) . } ORDER BY ?value
There are exactly 999 numbers between 1 and 999, so none is missing.
Let’s keep only the numbers that are the multiples of 3 or 5, by checking that their division by 3 or 5 has no remainder:
SELECT DISTINCT ?value { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) . } ORDER BY ?value
Here are the results.
Now, we just have to sum the numbers:
SELECT (SUM(DISTINCT ?value) AS ?sum) { ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value . FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) . }
And Wikidata gives us the solution of the problem!
Problem 2
The objective of the second problem is to find the sum of even Fibonacci numbers below 4,000,000.
Let’s find Fibonacci numbers available in Wikidata:
SELECT ?value { ?item wdt:P31 wd:Q47577 ; wdt:P1181 ?value . } ORDER BY ?value
Only the twenty first ones, below 10,000, are known in Wikidata. As a general-purpose database, Wikidata is certainly not made to store every existing numbers under 4,000,000. It makes the method previously used in the first problem not applicable here.
At the moment, I don’t see any simple solution to solve this second problem using Wikidata’s SPARQL endpoint, and probably neither to the following ones. Maybe you can do it?
For your information, more complex problems, not related to Project Euler, have been solved using SPARQL, outside of Wikidata’s endpoint, like Sudoku (for instance here or there).
A short history of Denelezh
A few weeks ago, Gnom was wondering what is the actual size of the gender gap:
Do we have an estimate of the size of the gender gap on Wikipedia given the current notability criteria? In other words, if every notable living or dead person had an article, what percentage would be about women? For example, would 30% women biographies be a good ballpark for a ‘closed’ Wikipedia gender gap?
This was my answer, giving some insights about the history of Denelezh, a tool that provides statistics about the gender gap in the content of Wikimedia projects:
Your question is the one I had in mind when I built the first version of Denelezh. My idea was to split the problem into smaller ones and to work by subsets to facilitate the study. There are subsets where the percentage of women is perfectly known. For instance, here are the percentages of women in the lower house of the French Parliament since 1945. In the Wikipedia in French language, members of parliaments automatically comply with notability criteria. Other subsets have to be studied. The tool partially allows to do so by providing statistics about humans depicted in Wikidata along various dimensions that you can combine (for instance country of citizenship + occupation = French politicians). But merging these subsets is not easy, as they are overlapping: you can’t simply add the percentages from two sets when some people belong to both. An athlete can also be a politician, someone can have several countries of citizenship, and so long…
The first version of the tool provided statistics about humans in Wikidata with a gender, a year of birth, and a country of citizenship. The assumption (unverified) was that Wikidata items with all these properties were of better quality than the ones with one or more missing properties. The problem is that statistics were only about 50% of humans depicted in Wikidata, and thus were misleading for people studying the gender gap in Wikimedia projects.
The second version, which development was rushed, solved this problem by no longer filtering Wikidata items on the number of properties they have and providing statistics as long as the data was available. With this change (and the addition of Wikimedia projects as an available dimension to filter / combine), it became closer to WHGI. One current problem is that the tool lack statistics on Wikidata quality (for instance, how many Wikidata items depicting humans have the property gender?).
The third version will be a merge of Denelezh and WHGI. Some ideas are already in the pipeline (adding external IDs as a dimension, producing lists of notable people to help Wikimedia editors to find subjects to work on, etc.), some others are on Phabricator. Feedback welcome 🙂
Wikibase Yearly Summary 2020
This timeline tracks what happened around Wikibase in 2020 and is frequently updated. Items are listed using the following categories:
- Article
- Documentation
- Talk
- Meeting
- Staffing
- Technical
Please contact me if something is missing!
January
- 2020-01-22 Wikidata & Wikibase office hour by Wikimedia Deutschland (notes).
- 2020-01-22 Samantha Alipio becomes Product Manager for Wikibase at Wikimedia Deutschland.
- 2020-01-22 2020 development roadmap for Wikidata and Wikibase by Wikimedia Deutschland.
- 2020-01-23 Change: Wikibase installations: new term store enabled.
- 2020-01-25 WBStack Infrastructure, by Addshore.
February
- 2020-02-01 Wikibase Ecosystem, taking Wikidata further, by Lydia Pintscher at FOSDEM (video).
- 2020-02-05 Change: Action API parameter validation.
- 2020-02-06 ADR-7: Use vuex-smart-modules in Data Bridge.
- 2020-02-17 Wikidata is just a matter of facts, by Andra Waagmeester.
- 2020-02-17 ADR-8: Format Bridge references via API.
- 2020-02-20 First online meeting of the Wikibase Community User Group (notes).
March
- 2020-03-04 Could you wikify an authority file? Wikibase has been evaluated for the Integrated Authority File (GND), by Barbara Fischer (DNB), Jens Ohlig.
- 2020-03-04 ADR-9: Refactor hooks for testability.
April
- 2020-04-01 → 2020-04-03 — EMWCon Spring 2020:
- Wikibase for Smart Cities, by Houcemeddine Turki, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha (slides, video).
- Wikibase, Docker & WBStack, by Adam Shorland (video).
- 2020-04-07 Wikidata & Wikibase office hour by Wikimedia Deutschland (notes).
- 2020-04-07 ADR-10: Do not add source information to Property IDs for the Federated Properties MVP.
- 2020-04-07 [es] Primeros pasos con Wikibase, by BVMC.
- 2020-04-16 Change: Blank node deprecation in WDQS & Wikibase RDF model.
- 2020-04-19 WBStack 2020 Update 1, by Addshore.
- 2020-04-21 Mohammed Sadat becomes Community Communications Manager for Wikidata and Wikibase at Wikimedia Deutschland.
- 2020-04-23 ADR-11: declare strict_types Rollout.
May
- 2020-05-05 Wikidata Course at TAU, Part 2/4: Wikibase & Lexemes, by Shani Evenstein with Lydia Pintscher.
- 2020-05-17 WBStack 2020 Update 2, by Addshore.
- 2020-05-19 Dan Shick becomes Technical Writer for Wikibase/Wikidata at Wikimedia Deutschland.
- 2020-05-20 Building an OpenRefine Reconciliation Endpoint for a Wikibase project: Lessons Learned, by Jeff Mixter, Bruce Washburn, OCLC.
- 2020-05-30 [fr] Wikibase exploration, huge technical experience feedback on Wikibase, related to FNE proof of concept.
June
- 2020-06-10 Cataloguing Practices in the Age of Linked Open Data: Wikidata and Wikibase for Film Archives, by Adelheid Heftberger, Paul Duchesne, FIAF.
- 2020-06-15 ADR-12: PSR-4 and maintenance scripts (T172368).
- 2020-06-18 Experimentations with Wikidata/Wikibase, by Karen Smith-Yoshimura, OCLC.
- 2020-06-26 ADR-13: Register shared features in Repo and Client.
July
- 2020-07-02 Second online meeting of the Wikibase Community User Group (notes).
- 2020-07-02 Version 1.1 of wikiba.se is released. Wikibase documentation is spread over several websites: mediawiki.org, doc.wikimedia.org, wikiba.se, learningwikibase.com, github.com (mediawiki-extensions-Wikibase, wikibase-docker), …
- 2020-07-09 Linked data for libraries with Wikibase, by Jens Ohlig at LD4 (video).
- 2020-07-13 [fr] FNE proof of concept: synthesis and decisions [it’s a go!], by BnF and ABES (report).
- 2020-07-15 Change: removing special pageterms behavior on repo wikis, use entityterms instead.
- 2020-07-16 [fr] FNE project: Wikibase proof of concept completed, by ABES.
- 2020-07-20 Change: WikibaseClient example config will no longer define an example repo.
- 2020-07-21 Wikidata & Wikibase office hour by Wikimedia Deutschland (notes).
August
- 2020-08-05 ADR-14: Make Wikibase.git a monorepo.
- 2020-08-06 Third online meeting of the Wikibase Community User Group (notes, video).
- 2020-08-17 Change: New datatype fields in Lexeme JSON output.
Icons by FamFamFam, CC BY 2.5.