I'm again speaking at the SBL Annual Meeting, this time in Boston. My topic is basically the “lemma lattice” work started by Ulrik Sandborg-Petersen and I back in 2006 but which I've never presented in this sort of setting before.


Speaking at SBL 2017 on Linking Lexical Resources and more...

Speaking at SBL 2017 on Linking Lexical Resources

I’m again speaking at the SBL Annual Meeting, this time in Boston. My topic is basically the “lemma lattice” work started by Ulrik Sandborg-Petersen and I back in 2006 but which I’ve never presented in this sort of setting before.

Here’s the official abstract:

Linking Lexical Resources for Biblical Greek

As more resources for Biblical Greek, both old and new, become openly available, the opportunities for integrating them become greater. At the level of the word, it might seem a trivial task to match based on lemma. But no two texts are lemmatised the same way and no two lexicons will make the same choices of headwords. Numerical solutions such as Strongs and Goodrick-Kohlenberger solve some problems but introduce new ones. After surveying the various issues and challenges, this talk will provide both a framework for moving forward and a report on practical ways that a variety of texts, lexicons, and other resources such as principal-part lists are being linked in the service of open, biblical digital humanities.

I’ll certainly post my slides after my talk but I’ll also try to record it on my iPhone like I did at BibleTech 2015.


A Tour of Greek Morphology: Part 17

Part seventeen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).

As mentioned in the last post in the series, we now have an inflectional class for all 5,314 present active infinitive or indicative forms in the MorphGNT SBLGNT in a file that looks like the following:

010120 ἐστί(ν) 3SG PA-10 εἰμί PA-10010123 ἐστί(ν) 3SG PA-10 εἰμί PA-10010202 ἐστί(ν) 3SG PA-10 εἰμί PA-10010206 εἶ 2SG PA-10 εἰμί PA-10010213 μέλλει 3SG PA-1 μέλλω PA-1010213 ζητεῖν INF PA-2 ζητέω PA-2010218 εἰσί(ν) 3PL PA-10 εἰμί PA-10010222 βασιλεύει 3SG PA-1 βασιλεύω PA-1010303 ἐστί(ν) 3SG PA-10 εἰμί PA-10010309 λέγειν INF PA-1 λέγω PA-1010309 ἔχομεν 1PL PA-1/PA-8 ἔχω PA-1

Where the columns are:

  • the book/chapter/verse reference
  • the normalized form
  • the morphosyntactic properties
  • the inflectional classes possible without disambiguation
  • the lemma
  • the disambiguated inflectional class

Now it’s time to do some counts.

Let us first of all look at the number of distinct lemmas in each of our 13 classes.

The numbers for classes PA-5 and above are low enough that we should look at them individually:

PA-1 barytone omega verbs338
PA-2 circumflex omega verbs with INF -εῖν / 3SG -εῖ145
PA-3 circumflex omega verbs with INF -οῦν / 3SG -οῖ21
PA-4 circumflex omega verbs with INF -ᾶν / 3SG -ᾷ31
PA-5 ζάω + compound (συζάω)2
PA-6a ὀμνύω; δείκνυμι + compound (ἀμφιέννυμι)3
PA-7 τίθημι + compounds (ἐπιτίθημι παρατίθημι περιτίθημι);
compounds of ἵημι (ἀφίημι συνίημι)
PA-8 δίδωμι + compounds (διαδίδωμι ἀποδίδωμι μεταδίδωμι παραδίδωμι5
PA-9 compounds of ίστημι (καθίστημι μεθίστημι συνίστημι);
compound of φημί (σύμφημι);
that one weird case of συνίημι
PA-9-ENC φημί1
PA-10 εἰμί1
PA-10-COMP compounds of εἰμί (ἄπειμι ἔξεστι(ν) πάρειμι)3
PA-11-COMP compounds of εἶμι (ἔξειμι εἴσειμι)2

Notice that even the small counts are elevated due to compound verbs. Folding compounds of the same base verb, the classes from PA-5 on have only one or two members.

This is just looking at the number of unique lemmas in each class but there are two other sets of numbers that are worth looking at: (1) the total number of tokens in the SBLGNT; (2) the distribution of classes amongst the hapax legomena.

class lemmas tokens hapax hapax details
PA-1 338 2563 151
PA-2 145 856 65
PA-3 21 35 15
PA-4 31 117 16
PA-5 2 41 1 συζάω
PA-6a 3 5 2 ὀμνύω ἀμφιέννυμι
PA-7 6 37 3 εἴσειμι παρίστημι παρατίθημι
PA-8 5 35 2 διαδίδωμι μεταδίδωμι
PA-9 5 9 3 συνίημι σύμφημι μεθίστημι
PA-9-ENC 1 22 0
PA-10 1 1551 0
PA-10-COMP 3 39 1 ἄπειμι
PA-11-COMP 2 4 1 εἴσειμι

Why do the hapax legomena matter? Well they give an indication of what classes were still productive.

Note, however, that the hapax in PA-5 and above are VERY low in number and, with the exception of ὀμνύω in PA-6a they are all compounds. This strongly suggests that only PA-1, PA-2, PA-3, and PA-4 were productive.

Notice that the token numbers for PA-6a, PA-9 and PA-11-COMP are particularly low too. Potentially relevant in the case of PA-6a and PA-9 is that these are the classes most like to have developed thematic alternatives. This might be worthy of a future post in this series!

Let’s now look at counts for each paradigm cell for each class:

  PA-1 PA-2 PA-3 PA-4 PA-5 PA-6a PA-7 PA-8 PA-9 PA-9-ENC PA-10 PA-10-COMP PA-11-COMP
INF 394 171 5 21 13 1 11 10 1 - 124 3 3
1SG 460 116 3 21 6 1 7 10 2 4 138 1 -
2SG 164 46 - 5 2 - - 1 - - 92 1 -
3SG 923 295 16 35 13 3 11 13 5 17 896 31 -
1PL 141 52 2 19 5 - 1 - - - 52 1 -
2PL 218 99 4 8 1 - 4 - - - 93 1 -
3PL 263 77 5 8 1 - 3 1 1 1 156 1 1
  2563 856 35 117 41 5 37 35 9 22 1551 39 4

What is obvious from this is just how important, regardless of inflectional class, the 3SG form is. The INF is also very important. We’ve seen in a previous post that both cells are very good predictors of inflectional class (much better than 1SG) but they are also just both very common. The 1SG, despite being a bad predictor, is still important in terms of frequency.

The 3PL is a distant fourth with one apparent deviation: it is very common in PA-10 (i.e. the copula), more so than the INF or 1SG. In fact, the proportion of 3PL in this class is actually average, it’s the INF and 1SG that are unusually low (with much of the frequency drop taken up by the 3SG).

As well as εἰμί, φημί (PA-9-ENC) is also disproportionately 3SG.

Of course, given how common PA-1 is, even the plurals there outnumber the most common cells in the other classes.

If the goal is just to identify the person/number, not the class, (which is true in reception but not learning) then a lot of those numbers collapse because of shared endings. Here are the counts just focused on the common endings (without accents):

INF 604
-ναι 153
1SG 606
-μι 163
2SG -{ι}ς 217
(-)ει 93
3SG -{ι} 1282
-σι(ν) 49
(-)εστι(ν) 927
1PL -μεν 273
2PL -τε 448
3PL -σι(ν) 511
-ασι(ν) 7

This just emphasises even more (even though it was in the previous table) that there is only 1 2SG in -ς (without an iota, subscripted or otherwise): the παραδίδως in Luke 22.48.

The 7 3PLs in -ασι(ν) are:

  • τιθέασι(ν) in Matt 5.15
  • ἐπιτιθέασι(ν) in Matt 23.4
  • περιτιθέασι(ν) in Mark 15.17
  • φασί(ν) in Rom 3.8
  • συνιᾶσι(ν) in 2Co 10.12
  • εἰσίασι(ν) in Heb 9.6
  • διδόασι(ν) in Rev 17.13

One could argue that these are subsumed by saying 3PL ends in -σι(ν) but given that, in the very same lexemes, -σι(ν) can also indicate 3SG, it is useful calling out the α, even though the root vowel alternation is enough to distinguish singular and plural.

That’s it (for now) for counts of the present actives. In the next couple of posts, we’ll turn to the middle forms.


Four Types of But

In his talk on adversive conjunction in Gothic at the 29th UCLA Indo-European Conference, Jared Klein started with a wonderful example paragraph in English.

In order to finish the project, I don't need money but2 time. I would like to be done by the end of this year, but3 I don't think that is going to happen. Nobody is to blame for this but1 me, because I've wasted a lot of time on things that have proved to be irrelevant. But4 this is too depressing; let's talk about something else.

He went on to talk about the Gothic equivalents for each but I thought it was a great illustration of four distinct types of adversatives all using “but” in English.

Klein didn’t necessarily use the following terms but the four could be described as:

  1. prepositional
  2. phrasal
  3. clausal
  4. discourse

A Tour of Greek Morphology: Part 19

Part nineteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).

It’s now time to do for the middle forms what we did for the actives in part 16, namely come up with the rules to help disambiguate inflectional classes. These were sketched out in theory in part 14 but now it’s time to actually write the rules and test them in code against the SBLGNT.

This is what my Python script does:

INF:Xεσθαι or 3SG:Xεται or 2PL:XεσθεisPM-1 if lemma ends in ω or ομαι
PM-7 if lemma ends ημι
1SG:Xομαι or1PL:Xόμεθα or 3PL:XονταιisPM-8 if lemma ends in δίδομαι
PM-1 if lemma ends in ω or otherwise ends in ομαι
1SG:Xοῦμαι or 3PL:XοῦνταιisPM-2 if lemma ends in έω or έομαι
PM-3 if lemma ends in όω or όομαι
1SG:Xῶμαι or1PL:Xώμεθα or3PL:XῶνταιisPM-5 if lemma ends in χράομαι
PM-4 if lemma otherwise ends in άομαι
2SG:XῇisPM-2 if lemma ends in έω or έομαι
PM-5 if lemma ends in άομαι
1PL:XύμεθαisPM-2 if lemma ends in έω or έομαι
PM-3 if lemma ends in όω or όομαι (not needed in SBLGNT)
PM-5 otherwise (not needed in SBLGNT)
3SG:Xεῖται or2PL:XεῖσθεisPM-2 if lemma ends in έω or έομαι
PM-11 if lemma ends in εῖμαι
1PL:XείμεθαisPM-11 if lemma is κεῖμαι
PM-11-COMPOUND otherwise (not needed in SBLGNT)
INF:XεῖσθαιisPM-2 if lemma ends in έω or έομαι
PM-11 if lemma is κεῖμαι (not needed in SBLGNT)
PM-11-COMPOUND otherwise
INF:XῆσθαιisPM-10-COMPOUND if lemma is κάθημαι
PM-5 otherwise (not needed in SBLGNT)

I decided to cover a bunch of ambiguities not specifically needed by the SBLGNT—not strictly necessary but it will help when the script is extended to run on a larger corpus.

Note the special-casing of δίδομαι, κεῖμαι, κάθημαι, and χράομαι. χράομαι is an example, like ζάω in part 16, that is misleadingly lemmatized with an alpha. More on that later!

We now have an inflectional class for all 820 present middle infinitive or indicative forms in the MorphGNT SBLGNT.

You can download the entire output of my Python script here.

Are there multiple classes for a particular lexeme (like there was in the active)?

Two of the 167 lexemes show multiple classes:

  • δύναμαι: PM-9 normally but a 2SG:δύνῃ that comes up as a PM-1 (PM-9 would predict a Xασαι)
  • κάθημαι: PM-10-COMPOUND normally but a 2SG:κάθῃ that comes up as a PM-1 (PM-10-COMPOUND would predict a Xησαι)

If κάθῃ were καθῇ, we’d have the possibility of reanalysis as a PM-5 and it’s still possible that’s what’s going on and the accentuation just doesn’t reflect that.

δύνῃ for δύνασαι is somewhat less expected and it should be noted that both forms appear in the SBLGNT, sometimes within the same author. That the PM-4 2SG all show up with an un-contracted ᾶσαι adds slightly more mystery.

For now we’ll leave δύνῃ and κάθῃ as PM-1 but we revisit them later.

In the next part, we’ll look at counts for the present middles across the SBLGNT.


Off to the UCLA Indo-European Conference

Tomorrow I’m off to Los Angeles for the Twenty-Ninth Annual UCLA Indo-European Conference.

Indo-European studies are notoriously impenetrable, even for linguists, but a couple of months ago, I finally decided now was the time to attend this major conference (to the extent an IE conference can be “major”).

I’m not great at conferences at the best of times, especially when I’m not a speaker and/or don’t know very many people, so this will be quite a stepping-out-of-the-comfort-zone for me.

But as an aspiring comparative philologist, I’m sure it’s going to be very rewarding for me.

Click here to safely unsubscribe from "J. K. Tauber: at the intersection of computing, linguistics, biblical greek and learning science."
Click here to view mailing archives, here to change your preferences, or here to subscribePrivacy
Email subscriptions powered by FeedBlitz, LLC, 365 Boston Post Rd, Suite 123, Sudbury, MA 01776, USA.