Toki Pona: Adventures in isolating compounding

Toki pona is an isolating language.  The 125 morphemes do not fuse, so it is somewhat like Chinese.  This makes it tricky to discuss compounding and derivational morphology because the linguistic jargon that is established almost universally assumes that words are created by solidly glomming words together.  Most derivational morphology is going on at the phrasal level in toki pona.

Compound words that include prepositions as modifiers must split or mutate in the nominative.

jaki lon nena sinpin  => jaki pi nena sinpin li tawa anpa sinpin mi. (change preps to pi)

OR jaki li tawa anpa sinpin mi lon nena sinpin. (move prep to end of sentence)

kule lon palisa luka => kule pi palisa luka

Compound words must be internally grammatical phrases as if they were stand alone.  This is analogous to chemen-de-fer in French, which retains prepositions in compound words.  Compound words in toki pona can contain modifiers, pi-phrases, and prepositional phrases.  Compound words can’t contain la, li or e phrases, i.e. no adverbials, verbs or direct objects.

Compound words can include conjunctions, but these compound words can only be use with prepositions and adverbial clauses, not in accusative!  Worse, there is no obvious transform to make it legal in the accusative.  More likely, the phrase grammar rules will be updated to disallow the accusative form of conjunctions between two head nouns, but allow conjunctions words in a pi phrase.

? soweli tawa pimeja en walo- zebra

(possibly should be soweli tawa pi kule pimeja en kule walo)

mi lukin e soweli tawa pi kule pimeja en kule walo.

Compound words that are modified by verbal phrases are not part of the language yet. I’m waiting to see how long it will take before it does.

Compound words with multiple modifiers can express themselves in multiple orders and it is the same word.

palisa noka lili – toe

palisa lili noka  – toe

Compound words have subset-anaphora. One could use “ona”, but because “ona” can agree with anything, it is more prudent to repeat the head of the compound word instead.  The sloth of the writer is paid for by the reader.

palisa noka lili mi li jo e pilin ike.  palisa li jo e kule loje.

My toes are sick.  They have a blue tinge to them.

Other imaginable variations include,

ona palisa  li jo e kule loje.  (pronoun with head word as modifier)

palisa noka li joe e kule loje.  (head plus one of the modifiers)

When you chart the frequency of words used in the corpus, you can see that the distribution is the combination of two curves, probably because function words are common and content words less so.  Some words play both roles.

Words 1-5 is the steep descent.  Words 6-36 are above the line for Zipf’s law.  I think this is because they are both content and function words.  Among them are the words we’d expect to find in a derivational morphology system:

suli- Augmentive (undivisible things),  e.g. kasa suli, tree
mute- Augmentive (divisible things) e.g. tomo mute, city.
sona- -ology, e.g. sona toki, e.g. linguistics (note that sona comes first, unlike -ology in English!)
ala- -un, e.g. kepeken ala, e.g. without (note this is a derivation on a preposition!)
lili- Diminutive (both divisible and indivisible things), e.g. jan lili, child
jan and ma are both important because they are the required head noun to loan words for countries and geographic locations.

And soon I’ll discuss the rest of the ways words can be created in toki pona (change of POS, loan words and eponyms).

