Solving Wordle, Sutom, and Gerdle with SPARQL queries and Wikidata

Wordle is a web game where the player has to find an English word. After each guess, the player is given clues (correctly placed letters, misplaced letters, not used letters). Variants of the game exist in several languages, like Sutom (French) or Gerdle (Breton). Of course, there were several discussions on Twitter on how to solve these puzzles with SPARQL queries on the lexicographical data from Wikidata.

Here, I present a general solution, inspired by previous discussions, to lower the number of needed guesses to find the correct word, using SPARQL queries on Wikidata. It is followed by specific rules for French and Breton languages. SPARQL queries can be run on the Wikidata Query Service.

General solution (English)

First, we want to gather all available forms for a specific language in Wikidata. Having all forms is important in order to have every possible word, and not just the singular forms for nouns or the infinitives for verbs.

Here is an example for English (Q1860):

  [] dct:language wd:Q1860 ; ontolex:lexicalForm/ontolex:representation ?form .
ORDER BY ?form

Length of the word

We want only words of a specific length. For instance 5 letters:

FILTER(STRLEN(?form) = 5)

Correctly placed letter

When we know the positions of some letters, we can apply a new filter with a regular expression.

Here, we are looking for a five-letters word with h in the second position and e in the last:

FILTER(REGEX(?form, "^.h..e$"))

The character ^ represents the start of the word and $ its end. The dot can be any letter.

Misplaced letter

When we know that a letter is present, but not at the correct position, we can apply two filters.

The first filter states that the letter is not at the specific position. Here, the letter i is not at the fourth position:

FILTER(REGEX(?form, "^...[^i].$"))

The second filter states that the letter is present at least one time in the word. Here for the letter i:

FILTER(CONTAINS(?form, "i"))

Letter present at most once

Sometimes, we know that a letter is present only once. We can write the following rule to check that the letter r is not present several times:

FILTER(!REGEX(?form, "r.*r"))

Don’t forget to add the following rule to check that the letter is present at least one time:

FILTER(CONTAINS(?form, "r"))

Letter not present

When we know that a letter is not present, we can filter out forms which contain it. Example for the letter a:

FILTER(!CONTAINS(?form, "a"))

Full example

Here is a full (but not final) example:

  [] dct:language wd:Q1860 ; ontolex:lexicalForm/ontolex:representation ?form .
  FILTER(STRLEN(?form) = 5)
  FILTER(REGEX(?form, "^....e$"))
  FILTER(REGEX(?form, "^..[^e][^i].$"))
  FILTER(CONTAINS(?form, "c"))
  FILTER(CONTAINS(?form, "r"))
  FILTER(!REGEX(?form, "r.*r"))
  FILTER(!CONTAINS(?form, "o"))
ORDER BY ?form

In details:

You can note that several rules overlap. This is because the query is built step by step after each guess. Optimization by merging rules doesn’t have much interest here.

Ideas for improvement

Instead of sorting forms alphabetically, forms should be sorted by the number of distinct letters they have, in order to have better chance of finding new rules at the next guess. And even better, the count should be made only with letters for which we don’t have rules yet.


The same rules can be used in French. However, French has a lot of diacritics like é or è that make hard to write these rules as you would have to list all possible combinations. A solution is to remove the diacritics before applying the rules on forms:

[] dct:language wd:Q150 ; ontolex:lexicalForm/ontolex:representation ?f .
BIND(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(?f, "[àâä]", "a"), "[éèêë]", "e"), "[îï]", "i"), "[ôö]", "o"), "[ùûü]", "u") AS ?form) .


In Breton, the difficulty is that some letters like ch and c’h are made of several characters. An idea is to replace these letters by jokers, like ch = 0 and c’h = 1.

[] dct:language wd:Q12107 ; ontolex:lexicalForm/ontolex:representation ?f .
BIND(REPLACE(REPLACE(?f, "c'h", "1"), "ch", "0") AS ?form) .

You then have to use these jokers in the rules. For instance, a word that doesn’t contain the letter c’h:

FILTER(!CONTAINS(?form, "1"))

Candidacy for the Wikibase Community User Group

Here is my candidacy for group contact of the Wikibase Community User Group.


I’m Envel Le Hir, a data architect, working part time for a major IT Company in France (unrelated to any work about Wikibase and the Wikimedia movement). I’m an active Wikimedian since 2015, with nearly 400K edits on Wikidata.

In the Wikidata community:

I’ve also been involved in the “meta” of the Wikimedia Movement. For instance, I helped solving the crisis that shook Wikimédia France in 2017, by starting the legal process to hold an anticipated general assembly, by representing the chapter at Wikimania 2017 and by being a member of the electoral committee during the most complicated assembly in the history of the chapter.

I have no conflict of interest (to be completely transparent, I worked a few months in a wiki-related startup more than three years ago). I am a member of Wikimédia France, the French Wikimedia chapter, and of April, a French non-profit organization promoting free software.

Involvement in Wikibase and in the Wikibase Community User Group

I work (0.2 FTE) on a personal project using Wikibase at its core.

I write the Wikibase Yearly Summary series (2020, 2021), which gives an overview of what happens around Wikibase.

Specifically on the Wikibase Community User Group:

Plan as a group contact of the Wikibase Community User Group

Here are the topics I would like to work on as a group contact of the Wikibase Community User Group:

Some of these actions can be done without being a group contact and I hope to work on them, regardless of the outcome of the election.


This could have been my candidacy. However, I don’t think I’m a suitable candidate for this position and I will not run for it. I hope the chosen representatives will adopt some of my ideas.

Wikibase logo by H. Snater, CC BY-SA 3.0.

A short history of the Wikibase Community User Group

This post is only a short summary, that does not mean to be perfect, and is part of a series, yet to be written. See also Another history of Denelezh.

The User Group

The Wikibase Community User Group is an organization, founded by Laura Hale and Miguel Paraz, to support Wikibase outside of Wikidata. For instance, its members created the Wikibase mailing list and the first Wikibase group on Telegram. The Wikibase Community User Group is independent from Wikimedia Deutschland, the organization that develops Wikibase, and was formed when Wikimedia Deutschland was not really promoting Wikibase outside of Wikidata.

The Wikibase Community User Group is a recognized affiliate of the Wikimedia Foundation, which gives several rights, like using the logos of the Wikimedia movement, applying for specific grants, taking part in strategic discussions of the Wikimedia Foundation with other affiliates, or voting for the affiliate-selected board seats of the Wikimedia Foundation.

In 2019, the Wikibase Community User Group was inactive. The founders of the user group seemed no longer active in the Wikimedia movement (one did not edit Wikimedia projects for years, and the other one deleted their Wikimedia account). As I believe in the usefulness to have an active Wikibase community user group, that can be a complement to Wikimedia Deutschland, I tried to revive it. For instance, I wrote a large part of the 2019 report, invited the community to improve it, and submitted it. I also applied to represent the user group at the Wikimedia Summit.

Wikimedia Summit 2020

The Wikimedia Summit is the annual conference of the Wikimedia Foundation affiliates. While it is organized by Wikimedia Deutschland, it is fully funded by the Wikimedia Foundation. The conference has strict eligibility criteria for participants, for instance up to one representative per user group.

Designation of a representative

There was immediately an issue with my application: two people applied as the representative of the Wikibase Community User Group. After discussing with the other one, it seemed that, while they had no interested in Wikibase, but that they were given the slot by the founders of the user group. They stated that they had no intention to leave it. I notified the community, who promptly reacted, starting a public vote to “untie the knot”.

Simultaneously, I also notified Lydia Pintscher and Léa Lacroix, as they were, at that time, the people I knew at Wikimedia Deutschland interacting with the Wikibase community, and shared with them the discussions I had with the other candidate. Léa Lacroix immediately came to chat with me, explaining that Wikimedia Deutschland “had tensions” with the founders of the Wikibase Community User Group, detailing some of the issues, and clearly showing that they were supporting my application.

The other candidate quickly withdrew their participation after the public vote started. However, this led to a harsh comment from María Sefidari (at that time chair of the Wikimedia Foundation Board of Trustees), who is personally linked to one of the founders of the Wikibase Community User Group.

The Wikibase community started to discuss the organization of the user group and what it can achieve. I proposed to organize a meeting on the topic. The idea was publicly endorsed by Lydia Pintscher, who also thanked the community for its actions on this case and the fact that we had “proper representation” at the Wikimedia Summit.

Even with a satisfactory outcome, this episode was intense and very stressful for me.

I took the time to read a lot of documentation and to contact several people, like experienced Wikimedians, to better understand how the issue could be solved permanently and what the community could achieve with the user group.

Preparation of the Summit

A month later, I contacted Samantha Alípio, Lydia Pintscher and Léa Lacroix (Jens Ohlig was immediately added to the conversation) to pursue the discussions and to prepare the first meeting of the user group and the Wikimedia Summit. I wanted to discuss several things, including:

The only person who replied (the one who explained that Wikimedia Deutschland “had tensions” with the founders of the user group) was this time hostile and undermined the conversation.

As we were going nowhere, I tried to have a direct chat with them to solve the issue. At that point, I was told that the Wikibase Community User Group had no legitimacy and was accused of wanting to control the actions of Wikimedia Deutschland.

As a volunteer, this was starting to be really upsetting. However, I sent another email to everyone to ease the tensions, reminding them of the previous discussions, explaining again my intentions, and making clear that I wanted us to work together on these events. I had no reply.

In a separate conversation, while she at first recognized that her behavior was inappropriate (and this was later publicly retracted), Léa Lacroix added more fuel to the fire and also made it clear that they would not help.

Given the situation, I realized that I would not have any constructive collaboration with Wikimedia Deutschland about the meeting and asked Andra Waagmeester to finish its preparation. During the meeting, Wikimedia Deutschland only stated that they did not want to be involved in the Wikibase Community User Group.

Disgusted by what happened (at the same time, they were personally asking for my help for one of their projects, as they did several times in the past years) and unable to see how I could attend the Wikimedia Summit with such hostility, I withdrew my participation to the event. I notified the community, stating publicly that it was “for personal reasons” to avoid drama.

Franziska Heine from Wikimedia Deutschland contacted me a few days later, proposing a call. I thanked her, but declined, as I did no longer see how to work with them after all that happened (there was this episode, but also this one and others…).

The Wikimedia Summit was later cancelled because of the pandemic.


Andra tried to pursue the efforts to organize the Wikibase Community User Group. Sadly, he received no real help and eventually gave up.

A few months later, Wikimedia Deutschland effectively took over the Wikibase Community User Group. They now organize its monthly meetings, reusing the format that I put in place. Unlike the meeting organized by volunteers, they fully promote theirs, using all their network.

Their 2020 development plan specifies the organization of “Wikibase community calls”, unrelated to the user group. When Wikimedia Deutschland could have organized separate meetings, consistently with their position of being only “bystander” of the user group, they jumped at the chance to take it over. At the same time, they publicly state that the Wikibase Community User Group is a “community initiative” that is “self-organized and that is independent of the structures within WMDE”. If Wikimedia Deutschland were consistent with their statement, they wouldn’t organize the meetings of the user group and would have set up the Wikibase Live Sessions outside the user group.

In my opinion, the strategy of Wikimedia Deutschland is to maintain the confusion between the Wikibase community and the Wikibase Community User Group (as the confusion that can exist between a project like the French Wikipédia and a chapter like Wikimédia France). I also think that they hoped that nobody would notice nor would be interested to solve the risk of derecognition of the user group (they waited for the point to be raised by a volunteer to discuss it, when they could have immediately incited the community to work on the issue). Thus, they would no longer be bothered by an official structure that they consider to be concurrent. If I’m wrong, I would be happy to read a consistent clarification from Wikimedia Deutschland about their position towards the Wikibase Community User Group.

To be clear, I’m happy that Wikimedia Deutschland finally got involved in the Wikibase Community User Group and I would be happy to work with them. However, this can only be done in a safe environment, where everyone treats others with respect and plainly take responsibility for their actions and positions.