Solving Project Euler problems with SPARQL and Wikidata

Founded in 2001, Project Euler is a website offering a series of more than 700 mathematical problems that you can solve with the programming language of your choice. Unsurprisingly, Python, C/C++, and Java are the most popular languages used by visitors of the website. I wanted to know if it was possible to use Wikidata and its SPARQL endpoint to solve these problems.

Spoiler: this post gives the solution of the first problem and discusses how to solve the second. Don’t hesitate to first go to Project Euler and to find the solutions by yourself.

Problem 1

The objective of the first problem is to find the sum of all the multiples of 3 or 5 below 1,000.

Let’s find all natural numbers in Wikidata, with their numerical values:

SELECT ?item ?value {
  ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value .
}
ORDER BY ?value

As you can see, Wikidata has more than 10,000 items about natural numbers. Some values are duplicated, like Q199 (the number 1) and Q713113 (the number 0.999… which is equal to 1).

Let’s remove the duplicates and filter the numbers to keep only the ones below 1,000:

SELECT DISTINCT ?value {
  ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value .
  FILTER (?value < 1000) .
}
ORDER BY ?value

There are exactly 999 numbers between 1 and 999, so none is missing.

Let’s keep only the numbers that are the multiples of 3 or 5, by checking that their division by 3 or 5 has no remainder:

SELECT DISTINCT ?value {
  ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value .
  FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) .
}
ORDER BY ?value

Here are the results.

Now, we just have to sum the numbers:

SELECT (SUM(DISTINCT ?value) AS ?sum) {
  ?item wdt:P31 wd:Q21199 ; wdt:P1181 ?value .
  FILTER (?value < 1000 && (floor(?value / 3) = (?value / 3) || floor(?value / 5) = (?value / 5))) .
}

And Wikidata gives us the solution of the problem!

Problem 2

The objective of the second problem is to find the sum of even Fibonacci numbers below 4,000,000.

Let’s find Fibonacci numbers available in Wikidata:

SELECT ?value {
  ?item wdt:P31 wd:Q47577 ; wdt:P1181 ?value .
}
ORDER BY ?value

Only the twenty first ones, below 10,000, are known in Wikidata. As a general-purpose database, Wikidata is certainly not made to store every existing numbers under 4,000,000. It makes the method previously used in the first problem not applicable here.

At the moment, I don’t see any simple solution to solve this second problem using Wikidata’s SPARQL endpoint, and probably neither to the following ones. Maybe you can do it?

For your information, more complex problems, not related to Project Euler, have been solved using SPARQL, outside of Wikidata’s endpoint, like Sudoku (for instance here or there).

A short history of Denelezh

A few weeks ago, Gnom was wondering what is the actual size of the gender gap:

Do we have an estimate of the size of the gender gap on Wikipedia given the current notability criteria? In other words, if every notable living or dead person had an article, what percentage would be about women? For example, would 30% women biographies be a good ballpark for a ‘closed’ Wikipedia gender gap?

This was my answer, giving some insights about the history of Denelezh, a tool that provides statistics about the gender gap in the content of Wikimedia projects:

Your question is the one I had in mind when I built the first version of Denelezh. My idea was to split the problem into smaller ones and to work by subsets to facilitate the study. There are subsets where the percentage of women is perfectly known. For instance, here are the percentages of women in the lower house of the French Parliament since 1945. In the Wikipedia in French language, members of parliaments automatically comply with notability criteria. Other subsets have to be studied. The tool partially allows to do so by providing statistics about humans depicted in Wikidata along various dimensions that you can combine (for instance country of citizenship + occupation = French politicians). But merging these subsets is not easy, as they are overlapping: you can’t simply add the percentages from two sets when some people belong to both. An athlete can also be a politician, someone can have several countries of citizenship, and so long…

The first version of the tool provided statistics about humans in Wikidata with a gender, a year of birth, and a country of citizenship. The assumption (unverified) was that Wikidata items with all these properties were of better quality than the ones with one or more missing properties. The problem is that statistics were only about 50% of humans depicted in Wikidata, and thus were misleading for people studying the gender gap in Wikimedia projects.

The second version, which development was rushed, solved this problem by no longer filtering Wikidata items on the number of properties they have and providing statistics as long as the data was available. With this change (and the addition of Wikimedia projects as an available dimension to filter / combine), it became closer to WHGI. One current problem is that the tool lack statistics on Wikidata quality (for instance, how many Wikidata items depicting humans have the property gender?).

The third version will be a merge of Denelezh and WHGI. Some ideas are already in the pipeline (adding external IDs as a dimension, producing lists of notable people to help Wikimedia editors to find subjects to work on, etc.), some others are on Phabricator. Feedback welcome 🙂

Wikibase Yearly Summary 2020

This timeline tracks what happened around Wikibase in 2020 and is frequently updated. Items are listed using the following categories:

Please contact me if something is missing!

January

February

March

April

May

June

July

August

September

October

November


Icons by FamFamFam, CC BY 2.5.