New day, new part of our review! Today it’s about occurrences, occurrences and about occurrences

Occurrences

From the lemma details page you get to the occurrences. This page has a similar design as the hit list: The search parameters are listed at the top. The number of occurrences is displayed. If necessary there is a pagination. On the right side there are buttons to show or hide information about the occurrences. As with the hit list, there is a maximum of 20 hits / occurrences on one page.

The biggest problem here is performance. It often takes a very, very long time to load the occurrences.

An occurrence is a sentence of a text in which the searched word is found. If the word is used more than once in the sentence, the new TLA counts this as only one occurrence. By default, the sentence in hieroglyphics, in transcription and in translation is given. This is followed by the incorporation of the occurrence in the hierarchical order of texts and objects, the dating and finally the information of the main author, the further editors and the time of the last change. After that there is a button to view (and reference) only this sentence.

Hieroglyphic writing

The hieroglyphs of the sentence form a continuous text, with a blank space left between each word form. The hieroglyphic word forms can be clicked. One gets to the detailed view of the word in question. The attested word is highlighted by a gray background. The hieroglyphs are shown here as images and not - as in the lemma list - in Unicode. By mouseover you get the code in Manuel de Codage. Occasionally there is the note “Glyphs artificially arranged” in the upper right corner. If we look at it correctly, all sentences have this hint, which already had hieroglyphs in the old TLA. But also new texts can show this hint, cf. https://thesaurus-linguae-aegyptiae.de/sentence/IBcBMTWY5pKI2kRBmwwGBj2lOk4. Sometimes only some sentences of a text have this clue, but others do not, cf. https://thesaurus-linguae-aegyptiae.de/search/sentence?tokens[0].lemma.id=708574. occurrences of 708574 The three occurrences for the personal name Tꜣ-šd.t-Ḫns.w are all from Tb 162 of pBerlin 3031. Only in the last occurrence the hint is given. Are the first two occurrences missing the hint or has the arrangement of the hieroglyphs already been done in the code?

An analysis of the partial extract suggests that in part the arrangement is hard coded. Thus, there are two coding systems side by side: one with arrangement and one where the arrangement is added afterwards. If one looks at the project history, this becomes explicable. Stephan Seidlmayer, the mastermind behind the old TLA, has published the input method of the hieroglyphs in a very readable article: Seidlmayer, Stephan: Bericht über eine Hieroglyphenschreibmaschine. In: GM 209, 2006, 81-90. In the input, the specification of the positioning, i.e. the arrangement, is omitted! From our Unicode perspective, this is perfectly understandable and highly recommended. This procedure was used for the input of the hieroglyphs for the old TLA, so that these texts are arranged artificially in the old and in the new TLA. Obviously, the statement of Mark-Jan Nederhof, which we like to quote so much, applies again:

Following the terminology of Unicode, a character is the smallest component of written language and a glyph is a shape that a character can have when it is rendered or displayed. In Egyptology however, there seem to be tendencies to remain true to the original manuscript while encoding a text, often to the extent of encoding glyphs rather than characters. (Nederhof, Mark-Jan. The Manuel de Codage encoding of hieroglyphs impedes development of corpora. In: Texts, Languages & Information Technology in Egyptology. Edited by Jean Winand & Stéphane Polis. 2013. 104.)

Since in the old TLA only texts of the Berlin-Brandenburg Academy of Sciences and Humanities are enriched with hieroglyphics and since all texts of the Saxon Academy of Sciences and Humanities in Leipzig in the partial extract have the arrangement hard coded, it is reasonable to assume that the departure from the clean input procedure propagated by Seidlmayer is due to the Saxon Academy of Sciences and Humanities in Leipzig. It is a great pity that the data are based on different standards, especially because the arrangement is completely meaningless from a linguistic perspective. Moreover, the same facts are treated differently: In https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd1VqSFbrdEXXkmGOCucFn58, F1*Z3 is coded first, and H1-Z3 is coded immediately afterwards, although the same coding should be present in both cases; because the order animal head + plural strokes is present in both cases. first group second group Why does a corpus linguistic project invest a lot of working time to insert the orders? Why does one deviate from a reasonable solution? Isn’t something like this evaluated by external reviewers?

As one can already guess with horror in view of the inserted arrangement, an attempt is made to generate a kind of facsimile. The data itself, i.e. its quality, its longevity and its reusability, play a subordinate role. The terrible ampersand is used, so that with it the data is not long-lived. Nobody should use the ampersand for encoding Egyptian hieroglyphs, cf. the impressive lines in Nederhof, Mark-Jan. The Manuel de Codage encoding of hieroglyphs impedes development of corpora. In: Texts, Languages & Information Technology in Egyptology. Edited by Jean Winand & Stéphane Polis. 2013. 107. Read the article! In https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd4UwAExhxk1FjTomHEaKOwg, the code used to encode a 𓄓𓄹𓏤 is as follows: F20&&(F51B*Z1). How is something like this supposed to be properly understood in the long run? How should something like this be able to be reused?

Also, rotations are now specified that have no linguistic relevance, e.g. 𓌸 in https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0FRn8G67U4Sr0SM0re9RHY.

This example also shows the big mess with destruction specifications. encoding standards? A gap of 1.5 rectangular groups is represented in this sentence in three ways: either as a shade mark, or by square brackets with a 1.5Q between them, or by specifying about 1.5Q without square brackets. Some occurrences have a [...], some have a shade mark. However, in the data from the partial excerpt, only the // is documented as a shade mark. Why was disambiguation done?

The hieroglyphs may be in text-critical characters: ⸢⸣ for partially destroyed characters, ⸮? for questionable characters, and «» for haplographs. Again, different standards are encountered: https://thesaurus-linguae-aegyptiae.de/sentence/ICECgs7eOzWZWUvNsqWie4d3L8E omits the encoding of hieroglyphs in haplography, and https://tla.bbaw.de/sentence/IBUBd9tf5Ak190sOtbY4xeJsNn8 annotates haplography. Regarding the different question marks see above!

The inventory of used hieroglyphs is very large. Unfortunately, variants and characters are used equally. In addition, signs are annotated which seem to exist only in an internal character set. https://thesaurus-linguae-aegyptiae.de/sentence/ICICh9L2FWXiPUjPq80eLndHxp0 has a US9I3VARA that could not be converted to hieroglyphics. undocumented data Such undocumented annotations are poison for long-term use of the data. In this form, the treatment of hieroglyphs in the text corpus is a huge step backwards compared to the old TLA!

Latin text is occasionally placed between the hieroglyphs, the meaning of which is not documented anywhere. https://thesaurus-linguae-aegyptiae.de/sentence/IBUCkyan7yNEmUz9mIveSP9M0bs has var and ?var?. The above mentioned sentence https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd4UwAExhxk1FjTomHEaKOwg has an lb. https://thesaurus-linguae-aegyptiae.de/sentence/5FLJWXUSC5C4LHEZ3WNAYZTDPU has a mono. https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0PdqTaKbkNsoTMaP9YJa1o has hierat. You can guess that lb stands for line break. This kind of markup is more important in the data than the correct encoding of the characters. Thus, in https://thesaurus-linguae-aegyptiae.de/sentence/IBUBdxOUYzUQYUVDjmZPjzsCXt8, the lb is written in 𓈗. This splits the character 𓈗 into three 𓈖, incorrectly postulating a spelling of mw, “water” with three individual 𓈖! wrong signs for mw

Modern paragraph references are also placed between the hieroglyphs, but the line counts are missing, cf. https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0LS7PQWpEfhutqMKzlSRPY. In our opinion, it would have made more sense the other way around. paragraphs instead of line count

Rubra are expressed by red hieroglyphs, cf. https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0LS7PQWpEfhutqMKzlSRPY.

Quite other information of a more descriptive nature is also given along with the hieroglyphics: https://thesaurus-linguae-aegyptiae.de/sentence/ICICIitsRMVgWEKSqZhPNlvu3Hw. description in hieroglyphic text Actually, this should not be a sentence at all, but should be split into subtexts according to the guidelines on page https://thesaurus-linguae-aegyptiae.de/info/text-corpus. In https://thesaurus-linguae-aegyptiae.de/sentence/3UDGRP6HLJFCXLL6ZSE2LRFSFA the additional information takes up a multiple of the hieroglyphics or even the transcription. long description

Verse points are also integrated into the hieroglyphic text, cf. https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd3Jqxbw1U0Q3kfziqhNFMsI, verse point although the transposition as verse points is not always successful: https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd3bPaK7Fi0YLha3A2ingCos. verse point ?

Finally, it should be noted that the concept of rendering the hieroglyphs as continuous text is problematic. The encoding was token oriented and not text oriented. We have stated elsewhere that rendering these token oriented hieroglyphs into continuous text is rather difficult. An example: https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd1rCR0bluES0sjKWMwCS1kU misleading hieroglyphs One has the impression here that a verse point is directly followed by a complemented verse point. In reality, the editor has supplemented some word forms which, of course, have no hieroglyphs.

Transcription

The following is the transcription of the sentence. Compared to the old TLA, the important innovation is that the sentence is written in Unicode. So it is now easier to copy the sentence. For the difficulties see below! If the word forms are lemmatized, the word is written in red. You can get to the lemmadetails page by clicking on it. Otherwise the word forms are black. Rubra are underlined, which is somewhat illegible with the underscores and dots in the Egyptological transcription, cf. https://thesaurus-linguae-aegyptiae.de/sentence/HTDDBK3QINA57CQ7O5PB526CTM. rubra in transcription.

The word forms are written in the usual Egyptological transcription. The usual text-critical brackets are also used. Let’s compare the rendering of a sentence in the old TLA, in our ORAEC and in the new TLA. Let’s choose: https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0rJqvnET0i9h7JmM5xPxHc transcription old TLA transcription ORAEC transcription new TLA ORAEC and the new TLA agree in that they use Unicode. But still, the similarities between the old TLA and ORAEC are greater. Namely, the new TLA intervenes in the text representation in two ways. First, the new TLA converts all commas into dots. In doing so, the commas are a most welcome source of information, because the commas separate lexically relevant endings, while the dots indicate lexically irrelevant endings. In other words, a comma-delimited ending will appear in the dictionary, while a dot-delimited ending will not. This is a nice reading aid - popular among students - if you can distinguish directly between nb,t (the lady) and nb.t (feminine form of the adjective nb). Why is the added value abolished? Further, the new TLA capitalizes plural endings and dual endings, which - according to the partial extract - are encoded in the data as .pl and .du. This is uncommon and also sometimes leads to strange results like a Ḏꜣꜣ-qr(r).tPL in https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd9eXbgxXA0zSsm6BUICKNrc. Why is time spent on this?

Line counts are printed in bold. Destruction details are gray and italicized. Paragraphs are also gray. Other indications appear in normal type. Verse points are also included in the transcription. Unfortunately, they are rendered with ; because that is the bullet, which has quite different semantics than the verse point. Better would be the ·.

Translation

The translation of the sentence is in bold. Before the sentence begins, the language is marked. Usually it is German. Occasionally, English or French can also be found.

Path

The old TLA provides a hierarchical display of objects. hierarchical display of objects This is missing in the new TLA. Instead, there is a hierarchical path at the occurrences, which clarifies the incorporation of the respective text. The individual hierarchical levels have a classifying icon and are clickable. Then you get to the detailed view of the object. At the end of the path there is a line specification and the paragraph specification.

If no line specification is defined in the data, “null” is output as the value, cf. https://tla.bbaw.de/sentence/IBkDGeYa1wnFpUbXunJBGr3RVRo. Thus the empty line specifications behave differently than the other specifications in the new TLA: if there are otherwise no values for a field, it will not be displayed.

The fact that https://thesaurus-linguae-aegyptiae.de/sentence/3MP2R42KI5G4JBG26TZTVO7GAE has no path specification for the occurrence for nṯr is certainly a mistake.

Occasionally there is more than one path, cf. https://thesaurus-linguae-aegyptiae.de/sentence/3GP2YBKA3VBU5IMWE62P5WZN7A. more than one path What deeper meaning is at work here unfortunately remains hidden.

Reading variants

This is another new feature in the new TLA. The editors have partially left open how some elements are to be understood, and allowed equal reading variants side by side. One compares https://thesaurus-linguae-aegyptiae.de/sentence/3XCTJXEYGRBOJJ2ASZ3HZAVR2Y-01 and https://thesaurus-linguae-aegyptiae.de/sentence/3XCTJXEYGRBOJJ2ASZ3HZAVR2Y-00. a reading variant a further reading variant That’s the same occurrence! The first word can be either a form of the verb ḥꜥjꜥi̯ or the verb ḥꜥi̯. Note that this multiplies the occurrences. This is because both reading variants are listed as independent occurrences for the word nṯr, cf. https://thesaurus-linguae-aegyptiae.de/search/sentence?tokens[0].lemma.id=90260. reading variants in the occurrences.

Dating

Now the dating. Unlike in the hit list, no absolute date is given, but a ruler or a dynasty. The dating can either be selective or cover an interval, e.g. 18th to 19th dynasty. The dates are linked and link to a detail page of the thesaurus entry. This is a great innovation! Very good! Occasionally a dating is missing, for example https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd3jwxnz1YUpVkt5xuD8FgOg, although the dating is present in the old TLA: https://aaew.bbaw.de/tla/servlet/GetTextDetails?u=guest&f=0&l=0&tc=402&db=1. without date in the new TLA with date in the old TLA

Author

As with the lemma details page, there is also the specification of the main author, the further editors and the specification of the last change. Also here it is not clear what is meant by main author, how the details of the further editors are sorted. See above!

Detail view

Finally, there is a button that leads to a detailed view of the sentence in question.

Sorting

The biggest shortcoming of the records is that they are not sorted in any way. Even occurrences from the same text can be found on completely different pages. This is a disaster for high-frequency words, especially since it takes a long time to scroll through the pages. The page does say “(Possibilities to sort the results list will be added in a future version of the TLA web application.)”, but that falls short. The point is that there must be a default sorting without any user action!

Magical multiplication of the occurrences

The variant readings discussed above increase the number of occurrences, indeed they increase the number of word forms processed. https://thesaurus-linguae-aegyptiae.de/info/introduction speaks of “1.44 mill. lemma tokens.” Besides this trick to increase the number, there are others:

Occasionally, completely destroyed word forms are lemmatized when context makes the addition unambiguous. But https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd2p7DCqgEk5rsIsZgnG6Mt4 is too much of a good thing! completely destroyed Here the whole sentence is destroyed and completely completed - probably because of parallels. Not even in the translation is it marked that the text has been reconstructed. But these five lemmatized word forms are among the “1.44 mill. lemma tokens”.

In https://thesaurus-linguae-aegyptiae.de/sentence/IBUBd0Rw4b19yELUr3E123NyqEM the Egyptian scribe accidentally wrote 𓏏𓈖 for the feminine genitive. two tokens In the new TLA, however, two word forms are applied: {tn} and <n.t>. Not only the expected word form is lemmatized, but also the one to be erased! This also increases the quantity of the occurrences. Moreover, one cannot now explore possible errors in the spellings of the feminine genitive. The 𓏏𓈖 is indicated with the erased word form.

Modification

As with the hit list, you can modify the display of the occurrences on the right side. Only the transcription is mandatory, everything else can be hidden. A really excellent innovation, which must be praised very much, is the block view. Here the individual tokens get independent columns, in which various information about the word form can be found. block view Besides hieroglyphics and transcription there are word class, two kinds of grammatical information, the translation of this word form and a button to copy the token ID. Fantastic! We had to criticize a lot so far, but here a treasure has been provided. Thanks a lot!


<
Previous Post
Review of the new TLA - Part two
>
Next Post
Review of the new TLA - Part four