Converting Manuel de Codage to Unicode

In our last blog, we promised to present texts encoded according to our recommendations. We are now making good on that promise. However, we do not start from scratch to enter Egyptian texts. Rather, we look at the rich tradition of how Egyptian texts have been encoded so far and convert these texts to Unicode.

The gold standard in Egyptology is the so-called Manuel de Codage. Hieroglyphs are coded according to Gardiner’s sign list or according to their phonetic value. Manuel de Codage encodes a 𓅓 as G17 or as m. Different operators connect the codes. A : puts two hieroglyphs under each other.

This looks a bit strange to people outside Egyptology, but we just said gold standard and rightly so! Most digital Egyptian texts are coded according to the Manuel de Codage. (And most of these texts are created with JSesh.) So it makes sense to try to convert these texts into Unicode, so that Egyptology does not only use its isolated solution Manuel de Codage, but encodes the hieroglyphs - like all other scripts of the world - in Unicode.

Of course, we are not the first to attempt the conversion. For example, here is a repo on Github that does the conversion. We write this blog here and make our own version because we don’t want to use all 1071 hieroglyphs of Unicode. We explain the reason in our last blog. 𓏃 and 𓏅 are only two variants of one character, namely 𓏃. That’s why we recommend to use only 𓏃. 𓏃 and 𓏅 have their equivalent in the Gardiner’s sign list mentioned above. 𓏃 goes back to W17 and 𓏅 to W18. This means that in Unicode, according to our recommendations, both W17 and W18 are to be encoded as 𓏃.

So what do we need? Right. A mapping between Manuel de Codage and Unicode according to our recommendations. We have created such a mapping with currently 1807 correspondences: https://github.com/oraec/formerly-mdc-now_unicode/blob/main/complete_mapping.csv Check it out! Github’s formatting of csv files is so user-friendly!

The mapping contains the correspondences for the characters, variants, ligatures and substitutes of Gardiner’s sign list. Furthermore it contains the correspondences for the phonetic values from the Manuel de Codage and for special codes from the implementation of Manuel de Codage in Wikipedia.

Why Wikipedia? Well, the sister project Wikisource encodes some Egyptian texts with this implementation of Manuel de Codage. Yes, yes, the Boomer Egyptologists are now grumbling because the quality of the encoding in Wikisource is sometimes not so good. But that is not the point at all! We just want to show that we can transform such texts. This proof of concept works because these texts are under a free license, so we can reuse them for our conversion. So, thanks to you Wikisource enthusiasts! We could make good use of your work. By the way, we didn’t take all texts, but only those where the hieroglyphic input is completed. That’s a total of 12 texts.

The converted texts can be found in the folder https://github.com/oraec/formerly-mdc-now_unicode/tree/main/wikisource. Have fun browsing and stay tuned! We will convert even more!

Recommendations for encoding Egyptian hieroglyphs in Unicode

Missing Unicode characters and further mapping problems