Review of the new TLA - Part one

In the last blog, we did a review on keyboards for Egyptological transcription. We have now tasted blood and want to write another review, on new TLA.

We’ve been taking a close look at the new TLA over the past two weeks since it went live. We have a lot of comments and therefore have to split our review. In today’s part we give an introduction and discuss the search. The second part covers the hit list and the lemma detail page. The third part discusses the occurrences. The fourth part (finally the last one!) discusses the other detail pages and the data in general. Finally, a conclusion is drawn here. So the tl;dr is only at the end ;-)

Introduction

What is the TLA? The Thesaurus Linguae Aegyptiae - that’s the resolution of the acronym - is one of the lighthouse projects of digital Egyptology. On the one hand, it offers a lexicographic access to the Egyptian language with its demotic word list and its Egyptian word list. On the other hand, it has the largest text corpus of Egyptian texts that are transcribed, translated and lemmatized. Lemmatization links the lexicographic and corpus linguistic domains of the TLA. Thus, the TLA is the central tool in Egyptology for determining word meanings and finding occurrences. It harmoniously links lexicography and corpus linguistics.

The TLA itself is not under a free, open sponsorship such as Wikipedia. Instead, one institution is responsible for the TLA, namely the Berlin-Brandenburg Academy of Sciences and Humanities, cf. https://aaew.bbaw.de/tla/impressum.html The data itself is not licensed in any form, so it cannot be re-used.

The first version of the TLA went online in 2004! So it is older than the iPhone. At that time the TLA was highly innovative, it was virtually the innovation driver for other digital projects - inside and outside Egyptology - that could take their cue from it. There were several updates, at least once a year, until 2014. Since then, the TLA was on the web unchanged. There were no changes, neither in data nor in functionalities. There was a fear that the project had now fallen asleep and no more updates would come. That was also one aspect of why we created ORAEC. Without the TLA, there would be a very big gap in digital egytology.

But now there is a sign of life: https://thesaurus-linguae-aegyptiae.de/. start page of the new TLA The new TLA looks very fresh. Many thanks to Angenehme Gestaltung, who is responsible for the design. Now there is finally a responsive design, so the new TLA is also usable on smartphones. Thanks a lot! The technical architecture of the TLA has also been fundamentally renewed. You can see the sophisticated modularization and also the use of Elasticsearch as search engine in the Githubrepos of the project tla-web, tla-es and tla-common. To the original architect, Jakob Höper, we can only pay our respects for the visionary design and foundation of the infrastructure. Angenehme Gestaltung and Jakob Höper are obviously responsible for making the new TLA a modern, contemporary product. Many, many thanks!

Now the new TLA is here, but the old TLA is still online at https://aaew.bbaw.de/tla/index.html. start page of the old TLA The new TLA is nowhere near the functionality of the old TLA, so it makes a lot of sense that the old one is not shut down. There is no combination search, no texts are displayed, the DZA is not integrated. So at the moment the corpus linguistic component is completely missing. We can only hope that this will be added soon.

But first let’s take care of what is available! You can search for words. For these, there is a detailed view that also includes the occurrences.

The new TLA is multilingual. The interface appears in German and in English, which can be changed on each page. The new TLA in the desktop view classically is divided into three parts. The actual content is between header and footer. The header consists of six items: Search, TLA, Lemma Lists, Text Corpus, Listings, Help. TLA, Listings and Help provide further sub-items. All items (except Search) lead to static information pages. Unlike the old TLA, the help page is not context-sensitive. Unfortunately, you have to find the appropriate help page on your own. The central entry point is Search, which is discussed in detail below. header of the new TLA

Search

The old TLA has a search with five different fields: Lemma, translation, short reference, restrict search to word class, list of lemmata. search in the old TLA The new TLA sorts these fields in a different order and also has root. search in the new TLA - 1 and further search in the new TLA - 2

Lemma list(s)

The selection Lemma list(s) is done via checkboxes. This means that you can search both the demotic and the pre-demotic (called Hieroglyphic/hieratic in the new TLA and Egyptian in the old TLA) lists at the same time. A really great innovation that is appropriate for the Egyptian language in its entirety! Many thanks for this!

Transliteration

Then comes the most important field to search for an Egyptian word. Now you can enter the transcription of the searched word. The field is called Transliteration in the new TLA. In the desktop view, unfortunately, the layout is not quite right. The word “transliteration” is wrapped. This looks a bit strange. Why is it called transliteration and not transcription? We will see that the new TLA tends to use strange terms. We can’t list everything; because we want to limit ourselves to the essentials in this long review. But often the terms in the new TLA are not appropriate and rather misleading.

There are two input methods: Unicode and Manuel de Codage. Unfortunately, Unicode is not up to date. Instead of the Egyptian yod, which exists since 2019 with Unicode 12, a workaround is used. This is very unfortunate, as it means that the data of the new TLA is not completely future-proof. Furthermore, data with a supposed yod is published here, so that the searchability of the Egyptian data is limited. Because now you have to enter something twice in the search engine of your choice to find everything, once the variant with the correct yod and once the one used by the new TLA. For the alef there is already this confusion (see our last blogpost). It is an immense disadvantage for digital Egyptology that the new TLA uses the wrong yod.

For Unicode input, the new TLA provides a virtual keyboard. This is generally a great idea, but the implementation has not been successful. Apparently the solution from the Coptic Dictionary Online was reused. Even the Coptic layout is listed here, although you can’t search for Coptic words at all. coptic virtual keyboard Back to the Egyptian keyboard: An ï is offered, but the character is apparently not used in the word list at all. Why does one offer something for input that is not used at all? After all, no uniform font is used for the display. The ṱ comes from a sans-serif font, the t from one with serifs. Some characters like i̯, u̯ and h̭ appear corrupt.

The second input method is called Manuel de Codage, but it is not! For example, in the new TLA, i does not stand for j, i.e. the character for 𓇋, but for i̯. Here you should consult the help page, so that you enter the correct character. So this is the next example for misleading designations in the new TLA.

For the input in this field Regex are available. Regex are very powerful because they allow sophisticated search capabilities. Also the old TLA uses regex by assigning special meaning to the following characters: §, $, ^, [ccc], [^ccc], ?, + and *. regex old TLA Here the old TLA deviates from the usual usages when it defines the § instead of the . as a placeholder for any character. This makes a lot of sense from an Egyptological point of view, because the . is already needed as an ending marker. The + is also adapted for Egyptological purposes: the character preceding the + can appear twice, which often occurs in the Egyptian language and is called gemination. In the new TLA, there are the following eight characters instead: _, §, *, [ccc], ?, (ccc), $, and >. regex new TLA The characters that exist in both the old and the new TLA have the same meaning except for *. * means in the new TLA: any number of characters, but also no character at all. A ^, [^ccc] and + or their functionalities unfortunately do not exist. Instead, you have a _, which offers the same functionality as §, namely any character, a (ccc), which offers the same functionality as ?, namely an optional character, and a >, which allows left truncation, which can also be achieved by prefixing a *. The new characters are thus completely superfluous. Why was valuable programming time wasted on this? Why didn’t they integrate the other valuable regex characters from the old TLA instead? Is something like this not evaluated? For the user this is bad in so far as these characters are not available for a search. If you type _nn to find the demotic bird name _nn, you get 28 results. At the very end is the desired word. The superfluous use of _ as a regex leads to poor precision. Even the common words with parentheses, such as jri̯ (ḫft.j), cannot be found directly. Why annoy users by introducing nonsensical regex? Also, the regex in the new TLA are buggy: a > is supposed to allow left truncation. A >-jmn leads to no result. You have to type >jmn instead.

Furthermore, regex are uncommon in general search engines. Or do you enter such characters in your Google search or your Amazon search? Instead, you use auto complete. Completions are suggested when you start typing. DPDP = the Demotic Palaeographical Database Project implements this concept successfully. auto complete in DPDP Furthermore a fuzzy search is also useful. Searches in the (old and now also in the new) TLA very often lead to zero results. The TLA distinguishes between s and z. Your search for sꜣ, “son” leads to zero results because the entry in the TLA is zꜣ. The sun god is named Rꜥw, not Rꜥ. Horus is called Ḥr.w, not Ḥrw or Ḥr. The house is called pr, not pr.w or prw. How would anyone know? How will a casual user find anything here? Even the regex won’t help you if you don’t even expect that sꜣ won’t bring the expected result. The search with the transcription is thus worse in the new TLA than in the old TLA.

Part of Speech

Now Part of Speech: As in the old TLA, you can limit your search by Part of Speech. The new TLA now offers two selection menus to differentiate between Part of Speech and Subclass. part of speech - selection menu The default is (Any but personal/royal names). The old TLA had all words except personal names. It is understandable to hide the excessive number of personal names as default. Otherwise the user would not be able to find his desired results among the many personal names. Why are the royal names also hidden in the default setting of the new TLA? There are only 763 of them in the new TLA, so the argument of frequency does not apply. Why not the names of the gods? 3804 god names are much more frequent than 763 royal names.

Again, a misleading designation must be pointed out. While the English surface correctly speaks of “royal names”, in German it is called “Königs-/Königinnennamen”. This designation is only due to political correctness and contradicts what is filed in the data. Ranke states in the second volume of his Personennamen:

Die Namen von Königinnen sind, da sie sich im Wesentlichen nicht von den übrigen Frauennamen ihrer Zeit unterscheiden, grundsätzlich aufgenommen worden. (Ranke, Hermann: Die ägyptischen Personennamen. II. 1952, V)

The names of the queens can be found in the old and new TLA accordingly with the personal names. Here is an overview of some names: names of queens Whoever is responsible for the designation knows the material of the old and new TLA badly.

Root

The next field root is new compared to the old TLA, but it is a completely unintuitive. What are roots? The words of the Egyptian language can be assigned to roots, they are derived from roots. In this sense, the Egyptian language behaves like the Semitic languages. An example may clarify the connections: nfr, “beautiful”, snfr, “to make beautiful” and the mnfr.t amulet are all derived from a root nfr. This root is not evidenced by occurrences in the texts (so, on the other hand, are the normal words). It is rather an abstract category, which results from the contexts of the words belonging to it. In other words: roots and the words belonging to them are categorically different. The concept of “root” is currently in vogue in Egyptology. The Ancient Egyptian Dictionary, hereafter: AED, notes for an entry what other words of the same root there are. Satzinger & Stefanović have recently published an index of the roots of the Egyptian language. Their concept of root is a bit more complex, but not relevant to our question. So what does the new TLA do? Roots are treated there on the same level as normal words. If you search for nfr, you get regular words with the transcription nfr as well as the root nfr. (For the hit list see below) results with roots This is completely misleading. If you want to search specifically for roots, you specify the transcription of the root and select Root for Part of Speech, as if root were a category of Part of Speech. One would expect to be able to search for roots in the root field, but this is not the case. In this respect, this is a bad UI. If you enter nfr in the root field, you don’t get the root nfr, but all lemmas that belong to a root nfr. A bit crazy, isn’t it? If you search for transcription of lemmas, you also get the roots. But if you search in the root field, you get no roots. Egyptian has partly - how to say: homographic or homophonic - roots, e.g. bgꜣ, “to cry out” or “to be shipwrecked”. So it would be nice to get only words of the first root. But the search does not do that. Homograph / homophonic roots are treated the same here. Maybe it would be better to have a selection list with all roots instead of a search field. In general, it would certainly be user-friendly to separate the search for roots from the search for normal words. A search in the transliteration field would list only regular words, but no roots. A search in the root field would exclusively list roots. In this state, the search in the new TLA is irritating. Finally, another oddity: In the root field, one can only enter Unicode, but no Manuel de Codage. Nevertheless, the radio button Unicode appears, to which one cannot make any changes. This is not well thought out.

Translation

The next field translation allows you to search for translations of Egyptian words. You can choose from English, French and German. Excellent! That’s one more language than in the old TLA. Note, however, that very few words have a French translation and not all words have an English translation. Demotic words have German translations only, and many Egyptian titles and epithets lack English translations. In contrast to the old TLA, you cannot use a regex here. A search for Bauc?h? leads to zero hits. The old TLA simply compares the string of the search term with those of the lemma translations. A search for bau also returns bauen, Baum, or Bauteil. The new TLA uses Elasticsearch search engine technology. Apparently, the search term is reduced to a base form. Searching for matte (mat) in the new TLA also returns matt werden (become weak) translation search: matte So in these cases you get too many results.

Bibliography

Finally, there is the Bibliography field. The old TLA makes it possible to find all lemmas that can be found on a particular page of the Wörterbuch. A search for Wb 2, 343 returns all lemmas found on page 343 of the second volume of the Wörterbuch. bibliographical search - old TLA However, if one searches the new TLA, only one result is found, but it is not found on this Wörterbuch page: Mri̯=s-gr. The bibliographic field of this entry has the content “ Wb 2, 104.19 - Wb 5, 180.8 - LGG III, 343 “. The page number 343 is obtained from the LGG entry. Obviously, there is no phrase search. But without a phrase search this field is quite useless! bibliographical search - new TLA

Look up

In addition to the word search, you can also look up entries directly on this page by entering the ID. This applies to lemmas, sentences, texts, objects and thesaurus entries. The old TLA already has IDs for lemmas. These have the form of an integer. For the Egyptian lemmas, these IDs from the old TLA are identical to those from the new one. The IDs of the demotic lemmas have the prefix “d” or “dm” for negative values. These lemma IDs are mentioned in the Egyptological literature, so this is a successful feature to be able to directly look up these lemmas. The other entries do not have stable IDs in the old TLA. These IDs in the new TLA consist of a long string. We have already complained elsewhere that these cannot be remembered. So far, these IDs are known only from a partial excerpt of the TLA’s data and, building on that, from AED and AES. Accordingly, one can now look up the sentence IDs from AES and AED in the new TLA. sentence id in AED For the long text IDs that AED and AES use, we have created a mapping. These IDs can also be looked up in the new TLA. As is well known, the old TLA distinguishes between a text and an object on which the text is located. Text and object also have different metadata: the specification script belongs to the text and not to the object, while location belongs to the object but not to the text. The objects also have their own IDs, which can be looked up here. The IDs of the objects had not been prominently publicized until now. They can also be found in partial extract and in AED-TEI in the supportDesc element. The IDs of the thesaurus entries are also accessible in thesaurus.xml of AED-TEI. One has to take these detours because there is no direct search for texts, objects or thesaurus entries.

So the new TLA offers two kinds of IDs, one form for the lemmas and one form for everything else. You can tell the forms apart very well. You can tell if the ID is for a lemma or not. But if you have an ID such as Q2D5QGAURRGJ5NQ3FBPYANUG3M or find it in the literature, you don`t know if this is a sentence, a text, an object, or a thesaurus entry. Unfortunately, there is no field that is usable for all IDs.

If you mistype one of those long, unwieldy IDs and enter an ID that doesn’t exist, you get a remarkable page: error page Apparently there are performance problems with high-frequency lemmas, so they created static pages. But why does this show up when you want to go to a text or thesaurus entry?

There is a separate help page for the search, which can be accessed via the header. Next to it on this page are several icons with an i (if for information?), which actually offer the same information as on the help page. These could simply be deleted.

As praised above, the new TLA has responsive design. In a mobile view, the header and footer move to the background. However, far too much space is still wasted in mobile view, so the transliteration field is only found at the bottom of the screen. mobile search There is still room for improvement here. Likewise, unfortunately, one does not have the option to immediately submit the search when entering into the Transliteration field. This can also be optimized in the mobile view.

Review of keyboards for Egyptological transcription

Review of the new TLA - Part two