Reading Toki Pona is Hard

Reading toki pona is far more difficult than writing it. Valid toki pona can be read many ways and has garden paths (places where you need to back-track and re-think what you read). So as you read along, you need to have this sense of the odds that a phrase is parsed one way or the other that mostly comes from experience.

Invalid toki pona even harder to read because you have to parse it with common errors in mind. It boggles the mind that to read toki pona, you must have in mind a correct grammar and several alternate defective grammars, parse a sentence using all of them and then someone pick out which one was meant.

Still, there is a class of errors that are effortless to read. I think that the reading gotchas and the effort-less to read mistakes are signs of mistakes or limitations in the conlang’s fundamental design– underspecification and overspecification respectively. The grammatical errors that are likely to completely throw the reader off track are errors of the writer and not the language.

Parsing gotchas- Reading Valid Toki Pona

That first sentence might be a conditional.
The last word could be a modifier or someone/something that owns it.
The ni could refer forward or backwards and can refer to sentences, things, and things in the environment.
The ona could refer forwards or backwards and can refer to things and things in the environment.

post-verb, pre-e words can be a noun complement or an adverb.

The participle-like word might be first or last. Sometimes these have a verb/adverb reading as well.
waso tawa. running bird.
tawa pona. good running.
pana mani. monetary giving (payment)
pana sona
kama sona

The "x li lon y e z" construction feels backwards because place normally comes last (or at least late in the sentence).
The "noun li noun e noun" construction will probably be read with the noun as a verb 1st.
jan li pali e musi. The man turned the game into work. vs The man enganged in a game.
Predicate vs Intransitive readings
sina suli. You're fat. You are growing.
Modifiers to mi and sina (rare)
sina suli li tawa. You're fat/growing and are walking, vs You, sir, are going.

Some polysemy borders on homophony (most meanings are similar, some are getting kind of different)
pona means wash.
sona means understand.
Some meanings border on contronyms
awen - to keep doing something
awen - to hold, as in to stop.
physically near vs collaborate

Mind reading anaphora. You know what ona and ni refer to, I don't.
Short noun phrases. I read it with one of the basic meanings, you mean one of the obscure meanings.

L1 interference
mi, sina, ona mean us, y'all, and them, too.

Parsing invalid toki pona
These errors can be very difficult for the *reader* to recover from
A particle may have been dropped. pi's and e's most likely. Missing li is problematic for sentencesd with complex subjects.
An e phrase may have been dropped. Usually not problematic.
A 2nd li phrase uses the object of the last sentence as subject.
* mije li lukin e meli li tawa lon esun.
Means A man saw a girl and he was walking in the store.
Not "A man saw a girl walking in the store"
Minimal pair confusion. If a sentence doesn't make sense, try mentally swapping out the minimal pairs with its alter-ego.

a/i/e confusion is really bad for readability
missing period. In general, where phrase splits aren't explicit, reading is harder.

Some errors are (mostly) effortless to recover from *for the reader*:
period instead of colon after ni
extra li for mi/sina
missing noun for a proper modifier
lower case proper modifiers.
missing "e" sometimes, because adverbs are rare and you can parse the DO by position.
dropping lon from a prepositional phrase, e.g.
laso suli li lon sewi lawa mi
laso suli li sewi lawa mi.
pi missing before mi mute/sina mute. No one shuffles modifiers and mi mute/sina mute.
n/m confusion is a non issue.
ala could negate the previous single word, or the entire phrase.
mi wile lukin ala e sitelen tawa.
mi wile ala lukin e sitelen tawa.
repeating prep phrases vs using en (And I'm not sure if one is an error!)
suwi li tawa mi tawa sina. vs suwi li tawa mi en sina.
questions without seme
yes/no questions that dispense with the X ala X pattern
Conjoining sentences with taso, anu, en
Conjoining verb phrases with en instead of li. (less common)
Opatative/Hortative with o in wrong place
* mi mute o musi! Lets play! vs. o mi mute li musi!
* jan o utala ala! let there be peace vs o jan li utala ala!
Modifying prep phrase in the subject
soweli lon tomo mi li wile e moku. The cat in my house wants some food.

My toki pona to do list

- Add jan Kipo’s spelling proposal to the speller.

- Merge in jan Kipo’s tp base word word list

- Implement a Hamurabi game localized for toki pona

- Translate common urban plants and animals into toki pona

- Genetic mutation algorithm for toki pona (something that would evaluate a relex based on something like how concise sample texts would become, the average distance between pairs of words, # of minimal pairs, etc), and the genetic part would come from “mating” words like kalama + lili = kali or anpa + pilin = anilin.  My current mutator isn’t as interesting as it could be.

- Add a dating, convention and skype event list to, something that works for a conlang community, because the meetup model just doesn’t fit.

- get the entire code base on to support alternate conlangs

- Compile all the republishable texts into a sorted and maybe a corrected toki pona graded reader

- Finish a reasonable reference grammar for toki pona *in style of a field linguist* (As opposed to the conlang style, which tends to be part ‘creative writing’ or ‘lessons’ which tend to avoid linguistics jargon or a standards document where every detail has to be sweated over and negotiated and so on)

- Add the short hand transliterator for toki pona, similar to the one that I did for pseudo-hieroglyphs.

- Fix the tp transliterator using a IPA based (or any pronunciation based) technique instead of techniques based on English orthogrpahy.

- Add a (primative) tp web chat that only allows toki pona text.

- Add a tp census database and link to the corpus documents.

- Write a tp tweet lesson and word a day stream and set it up for publication over a year.

- Publish tp anki decks that have a realistic # of facts (particular the large # of phrases, the large # of meanings that have to be memorized)

- Add pre-packages regex searches to the corpus search tool.

- Implement a phrase dictionary for tp (I had one before, but it sucked and I never fixed it)

- Finish writing a tp to c# object parser

- Write C# code for testing noun phrase equivallence in tp,

e.g. soweli laso suli = soweli suli laso.  – equivallent but shuffled

soweli laso loje — not equivallent, word order matters

tomo suli pi telo nasa ~ tomo pi telo nasa – equivalent except for a modifier

A machine parseable context free grammar for toki pona

This is entirely based on jan Kipo’s work.

The free command line parser.

This generate the parser:
agfl.exe tokipona.gra

This executes the parser:
agfl-run.exe tokipona.aob

Should you run the parser, it will create a graph of the sentence. If it thinks the sentence isn’t grammatical, you get no feedback at all.  The first thing one notices is that a simple sentence usually can be parsed many, many different ways.

PDF explaining the meta-syntax.

The text below is 2 files, tokipona.gra and tokipona.dat

#Starting tokipona.gra

#Version 0.01

GRAMMAR tokipona.

GRAMMAR tokipona.

LEXICON tokipona

ROOT Utterance.

  Sentence,[ "." ];
  Answer,[ "." ];
  Vocative,[ "!" ];
  Interjection,[ "!" ].


  "a a";
  "a a a";



  "taso", [ Cond ], [ Vocative ], NP_PLUS, VP, "[", "li", VP, "]";
  [ Cond ], [ Vocative ], NP_PLUS, VP, "[", "li", VP, "]";
  [ Vocative ], NP_PLUS, VP, "[", "li", VP, "]";

  "li", VP;
  "li", VP, LI_PHRASE.

  NP, "la";
  Sentence, "la".

# roughly subject
  OWORD, "li";
  NP_MINUS, "li".

#defined above in lexicon section
#  WORD.
#any word in tp except: li. la. e, o, pi, mi, sina, [en, anu, a, mu] (probably but status to be decided)  



  NP, "pi", NP_MINUS;
  NP, Conj, NP;
  NP, Num;
  NP, "pi", PP.


  XPrep, NP.

  XPrep, NP.

#  P;

#kepeken (doesn't work in verb slot)  (surely there will be more)

  [ "nanpa" ], Dig;
  Dig, [ "nanpa" ].


#(probably more, like the last, frowned on)  

#WORDs here should be same
  VC, "[", "e", NP, "]", WORD, [ PP ], WORD;
  VC, "e", NP;
  VC, WORD, [ PP ], WORD;

#Verb complements
  XM, VP.

#some sort of modifier
  XM, NP.

#assuming this is modifiers, including mi, sina
  WORD, M.

#assuming this is modifiers, including mi, sina


"akesi" OWORD
"ala" OWORD
"ale" OWORD
"ali" OWORD
"anpa" OWORD
"ante" OWORD
"awen" OWORD
"esun" OWORD
"ijo" OWORD
"ike" OWORD
"ilo" OWORD
"insa" OWORD
"jaki" OWORD
"jan" OWORD
"jelo" OWORD
"jo" OWORD
"kama" OWORD
"kala" OWORD
"kalama" OWORD
"kama" OWORD
"kasi" OWORD
"kepeken" OWORD
"kili" OWORD
"kin" OWORD
"kiwen" OWORD
"ko" OWORD
"kon" OWORD
"kule" OWORD
"kute" OWORD
"kulupu" OWORD
"lape" OWORD
"laso" OWORD
"lawa" OWORD
"len" OWORD
"lete" OWORD
"lili" OWORD
"linja" OWORD
"lipu" OWORD
"loje" OWORD
"lon" OWORD
"luka" OWORD
"lukin" OWORD
"lupa" OWORD
"ma" OWORD
"mama" OWORD
"mani" OWORD
"meli" OWORD
"mi" OWORD
"mije" OWORD
"moku" OWORD
"moli" OWORD
"monsi" OWORD
"mu" OWORD
"mun" OWORD
"musi" OWORD
"mute" OWORD
"nanpa" OWORD
"nasa" OWORD
"nasin" OWORD
"nena" OWORD
"ni" OWORD
"nimi" OWORD
"noka" OWORD
"oko" OWORD
"olin" OWORD
"ona" OWORD
"open" OWORD
"pakala" OWORD
"pali" OWORD
"palisa" OWORD
"pan" OWORD
"pana" OWORD
"pilin" OWORD
"pimeja" OWORD
"pini" OWORD
"pipi" OWORD
"poka" OWORD
"poki" OWORD
"pona" OWORD
"sama" OWORD
"seli" OWORD
"selo" OWORD
"seme" OWORD
"sewi" OWORD
"sijelo" OWORD
"sike" OWORD
"sin" OWORD
"sina" OWORD
"sinpin" OWORD
"sitelen" OWORD
"sona" OWORD
"kama" OWORD
"soweli" OWORD
"suli" OWORD
"suno" OWORD
"supa" OWORD
"suwi" OWORD
"tan" OWORD
#"taso" OWORD
"tawa" OWORD
"telo" OWORD
"tenpo" OWORD
"tomo" OWORD
"tu" OWORD
"unpa" OWORD
"uta" OWORD
"utala" OWORD
"walo" OWORD
"wan" OWORD
"waso" OWORD
"wawa" OWORD
"weka" OWORD
"wile" OWORD

Dialects of toki pona

There are at least three dialects (or styles) of toki pona.  These dialects are driven by the interest of the user, not so much separate communities and some authors write with a little of two, three or all of these styles.

toki pona pi nasin tan
toki pona pi nasin tan is the original, simple toki pona.  nasin tan has a disdain for numbers, aggressively minimizes tense, gender and number in sentence and pronoun contructions.  Nasin tan would pick phrases that imply a ignorance of science, or better yet a phrase that is equally compatible with a naive or scientific view of the world.  Simple toki pona can’t really translate existing texts because so much information is lost, the result isn’t really a translation. An example would be jan Pije’s translation of a paragraph movie review as “ni li ike”   The best and most representative of this style is written from scratch and not a translation.

jan Pije, jan Wiko, jan Akidave, jan Ape are the best exemplars.

toki nasa
toki nasa sees toki pona as just another language. The goal is to express anything that one may need to express.  Well written toki nasa is capable of translating from typical languages without much loss of information.  There is no particular disdain for numbers.  Where conventions are absent, they are to be established on the spot, much as one would in real life when one’s mother tongue come across a new object or a new situation– for example, is a talking rock a he, she or it? An English speaker would have to solve the puzzle on the spot.

Sentences can be quite long and paragraphs interlock closely.  When writing, toki nasa, one pushes the boundaries of grammatical possibilities by seeking out constructions that may not have any canonical support, but follow the general pattern of the language.  Toki nasa is also more likely to ignore canonical rules that similarly don’t fit the general patterns of the language.

toki nasa is more willing to just use ones own culture as guide and not worry about what the philosophical implications are.  This shows up in terms of metaphysical obsessions (noting time, place, gender more often than strictly might be necessary), biases in choices about what is salient (is it the measure and colors or some other quality that is salient?), and of course in guiding editorializing phrasing (palisa moli = cigarettes).

toki nasa linguistically is a creole, because the fuzzy parts of the language definition are being filled in by a variety of speakers from different linguistic backgrounds.

toki nasa isn’t completely crazy.  This writing style has a goal of being read and understood, so there is a bound to how much can be done within toki nasa– see toki ante, below, for more.

jan Mato, and anyone that is translating hard texts is in this camp.

toki pona pi nasin nanpa
toki pona does have some Lojban influence. The rules have been written down in BNF form and most toki pona could could be machine parsed– the most accurate rule set is on the wikipedia article.   A toki nasa writer would write toki nasa that doesn’t validate if it helped get around an edge case as the wikipedia article doesn’t cover all cases, especially when reading the rules depends on POS and certain constructions too technical to list right now.

toki pona pi nasin tan would probably validate in almost all cases because in nasin tan, the sentences are short, conservative and don’t try to do much.  For example, a nasin tan verb would likely not have any modifiers, so it has no opportunity to go wrong, where as verbs with modifier and modifier phrases could be parsed in many ways, not all completely worked out.

In real human languages speech errors occur at a shocking rate. They occur at a shocking rate in toki pona as well, but often one can see what the other person means.  So in casual toki nasa, only errors that are so egregious that they render the sentence incomprehensible are wrong enough to warrant fixing.  For example, pi dropping is wrong, but often not a problem when the upcoming word is pragmatically obviously a noun.

The toki pona derivative ROILA probably would be a form of toki pona pi nasin nanpa.

Linguistically, toki nanpa is the most like a constructed language, like XML, C#, COBOL and other formally describable languages that humans happen to also be able to understand.

jan Kipo is, not surprisingly, the best symbol for this style, although in practice I’d say jan Kipo style is conversational and somewhere between toki tan and toki nasa (simple, but practical).

toki ike
Toki ike is just grammatically wrong, a complete calque of a foreign langue,  unintelligible not just to some readers but most readers.  toki ike is also all the dialects and styles that you don’t prefer to write in at the moment.

toki ike is mostly restricted to people brand new to the language making their brave first attempts.  Linguistically speaking, toki ike is a pidgin– language spoken with the words of the target language but mostly the word order of one’s mother tongue.

toki ante
Some percent of the people who use toki pona are also constructed language hobbyists, intent either on extending or imitating toki pona.  Most derivatives of toki pona are so far from toki pona that they are identifiable as separate languages.

Anyone using more than 125 words has at least one foot in this camp.

rebracketing toki pona

Languages evolve through many mechanisms including rebracketing, where one word splits into two or they merge, often as a result of a misunderstanding or ignorance of the origin of the words.

kepeken – to realize an ability, to manifest  an ability.

ken- potential, ability

* kepemoku – to manifest food, to farm.

* kepejan – to manifest a person, to conceive or give birth

suli, lili, seli, great, small, hot– all various ends of a scale

*janli – man sized

*okoli – eye sized

Conlangs and Online Communities

Online communities, I have recently come to believe are a mixed blessing for constructed languages.  On one hand, for the last few hundred years of constructed languages, they typically languished without any attention at all because the audience for constructed languages is so thinly spread out, it was nearly impossible to get critical mass for a new language using traditional media.

Klingon, Na’vi and toki pona probably would all have disappeared at birth without mailing lists.  At the moment only Lojban seems to have had a serious pre-internet community and even Lojban would be a shadow of it is now without the internet. So what is not to love about using the internet at the primary place to find and build a community for your or your favorite conlang?

In my roamings online since last November, I’ve decided there are some serious pitfalls.

The internet affect (or compromises) language design. Even English gain a new vocabulary and is probably on the verge of new grammatical constructions from it’s use online.  Emoticons, ALL-CAPS means shouting, /commands, the threaded discussion, replacing diacritic letters with letter followed by x, all are changes to the language to adapt it to online needs.  A designed language has goals, such as being true to a fictional culture, a certain social goal–such as cross border communication, a certain therapeutic effect, and many other whimsical goals peripheral to the needs of facilitating keyboard mediated written communications amongst strangers widely dispersed across time and place.

Civility and Fight Club. “Academic politics are so vicious because the stakes are so low.”  All online communities run the risk of griefers, trolls, people who treat the internet like some sort of fight club.  Even discussing the internet’s level of civility is a losing battle, with camps of people imagining that it isn’t a even a problem to begin with.  Those who do see it as a problem, often have no recourse but to leave the community.  This starts a downward spiral until most online communities are fight clubs, inhabited by  only those who are looking for a fight, enjoy fighting or can’t tell the difference between discourse and fighting anymore.

Someday, there might be a social or technical solution, such as human moderation, comment voting.  In the conlang world, prospects don’t look good.

In my years of living the real world, I’ve never encountered the number fights, and nastiness that just pops up all over the place in online communities, some I’m just observing, some of it I end up on one end or the other.  Obviously, for many this is a non-issue.  They either enjoy fight club or are oblivious to it.  For me, each fight is a colossal distraction. As they say, if you can’t take the heat, stay out the kitchen.  I now choose to stay out of the online community kitchen.  I’ll use the internet for organizing in person meetups, posting my letters in a bottle to no one on my blog, but I’ve pretty much had it with participating in online communities.

Stay tuned for my next article on how to build communities for conlangs using real world resources.

Toki Pona: Double subjects, verbs and objects and preps

If you double all the parts of a sentence, you get something like this:

C1 la C2 la S1 en S2 li V1 li  V2 e O1 e O2 Prep NP1 Prep NP2.

With great power comes great responsibility. Unfortunately, we lack  co-ordinating machinery to co-ordinate this interleaved sentence except in a few cases.  Worse, by the time you realize that your interleaved sentence looks wrong, it’s too late.

C1 la C2 la is roughly additive adverbs that apply to all verbs.

S1 en S2 are subjects that did both action.

e O1 e O2 were the target of both actions.

NP1 Prep NP2 modifies the entire sentence.

If V1 is instansitive and V2 is transitive, both objects go with the transitive verbs.

If both verbs are transitive, check to see if co-ordination can be done semantically.  If not, you’ll need to split the sentence.

mi en sina li jo li moku e ilo moku e moku linja pan.  You and held a fork and ate spaghetti.

Like most toki pona sentences, the above has many nonsense readings, which can be safely ignored.

Some prepositions don’t make sense with certain verbs.  Preps resolve to the verbs they go with, and no the others.  If a prepositional phrase could go with either, consider using multiple sentences.

Sequential Interleaving.

mi li pali li moku li lape e pali mi e moku mi.  I did my work, ate my dinner and slept.

Again, this has a nonsense alternative of “I worked and ate and slept my work and food.”, which we can safely ignore.

Mixing predicates and SVO sentences

Predicative sentences and SVO sentences probably shouldn’t be mixed unless you double check for plausible ambiguities.

mi li suli.

mi li suli li pali e sijelo sama.  ? I am big and I work out./I increase the size of and work out my body.

? mi lon tomo mi li lape.
? I’m in my room, sleeping.

Mixing predicate and SVO sentences is outside of what wikipedia says we can do with toki pona, but I suspect people will do it anyhow.

Toki Pona: Adventures in isolating compounding

Toki pona is an isolating language.  The 125 morphemes do not fuse, so it is somewhat like Chinese.  This makes it tricky to discuss compounding and derivational morphology because the linguistic jargon that is established almost universally assumes that words are created by solidly glomming words together.  Most derivational morphology is going on at the phrasal level in toki pona.

Compound words that include prepositions as modifiers must split or mutate in the nominative.

jaki lon nena sinpin  => jaki pi nena sinpin li tawa anpa sinpin mi. (change preps to pi)

OR jaki li tawa anpa sinpin mi lon nena sinpin. (move prep to end of sentence)

kule lon palisa luka => kule pi palisa luka

Compound words must be internally grammatical phrases as if they were stand alone.  This is analogous to chemen-de-fer in French, which retains prepositions in compound words.  Compound words in toki pona can contain modifiers, pi-phrases, and prepositional phrases.  Compound words can’t contain la, li or e phrases, i.e. no adverbials, verbs or direct objects.

Compound words can include conjunctions, but these compound words can only be use with prepositions and adverbial clauses, not in accusative!  Worse, there is no obvious transform to make it legal in the accusative.  More likely, the phrase grammar rules will be updated to disallow the accusative form of conjunctions between two head nouns, but allow conjunctions words in a pi phrase.

? soweli tawa pimeja en walo- zebra

(possibly should be soweli tawa pi kule pimeja en kule walo)

mi lukin e soweli tawa pi kule pimeja en kule walo.

Compound words that are modified by verbal phrases are not part of the language yet. I’m waiting to see how long it will take before it does.

Compound words with multiple modifiers can express themselves in multiple orders and it is the same word.

palisa noka lili – toe

palisa lili noka  – toe

Compound words have subset-anaphora. One could use “ona”, but because “ona” can agree with anything, it is more prudent to repeat the head of the compound word instead.  The sloth of the writer is paid for by the reader.

palisa noka lili mi li jo e pilin ike.  palisa li jo e kule loje.

My toes are sick.  They have a blue tinge to them.

Other imaginable variations include,

ona palisa  li jo e kule loje.  (pronoun with head word as modifier)

palisa noka li joe e kule loje.  (head plus one of the modifiers)

When you chart the frequency of words used in the corpus, you can see that the distribution is the combination of two curves, probably because function words are common and content words less so.  Some words play both roles.

Words 1-5 is the steep descent.  Words 6-36 are above the line for Zipf’s law.  I think this is because they are both content and function words.  Among them are the words we’d expect to find in a derivational morphology system:

suli- Augmentive (undivisible things),  e.g. kasa suli, tree
mute- Augmentive (divisible things) e.g. tomo mute, city.
sona- -ology, e.g. sona toki, e.g. linguistics (note that sona comes first, unlike -ology in English!)
ala- -un, e.g. kepeken ala, e.g. without (note this is a derivation on a preposition!)
lili- Diminutive (both divisible and indivisible things), e.g. jan lili, child
jan and ma are both important because they are the required head noun to loan words for countries and geographic locations.

And soon I’ll discuss the rest of the ways words can be created in toki pona (change of POS, loan words and eponyms).

Toki Pona: More Unoffical Number Systems

The old official numbering system was, ala, wan, tu, mute, or a the limited roman style This was a roman system, e.g. W, T, TW, TT, TTW, TTT, TTTW, etc.  A ternary place value system would have been better, but just as verbose.

The official number system is a Roman style number system.  I sort of like it when written in Roman style, but only when written, and only for numbers up to about 159.   After 159, the length of the numbers starts to get unbearably long.

The don’t do math option.

This option is good up unto the point where you decide to translate anything of substance, eventually years will come up.  It’s even possible famous equations will come up, like e=mc^2.  Doubly so for non-fiction.

I recommend using wan, tu, mute when you are not in the mood for numbers. Use roman-style for numbers up to 160 or so, but when you aren’t doing math.  And for all other numbers, establish a system within your text and then uses it.  If I was going to invent a new kind of algebra, I might need to create a new notation. But once described, I’d be able to use it, using English, and no one would accuse me of re-plumbing English.  I think the situation is the same for constructed languages where the designer didn’t bother to work out the full number system.  And why should they? Each language designer designs a language for their own goals, and that is a good thing.

Options for Digits

- Assign base words with a vague sense of quantity to specific numbers.

In fact, this is the current state of affairs.

- Colors.

A color system either reuses the electrical resistor number system, or  the colors from ROYGBIV scale are mapped to decimal digits.

- Body Parts

Papa New Guinean natives will count along an imaginary line along their body.  For example nose could be one, lips  two, chin three, etc.

- Load words

I don’t really like loan words for small vocabulary languages.  Each loan word is a new word and increases the number of words a brand new user need to memorize before getting started. (Finishing learning a language will still require memorizing 1,000s of lexemes, but that is another blog post)

- Calender names.  Days can provide numbers from 1 to 7, Months 1 to 12.  Month could support decimal if we ignore two months.

There is a perfectly good proposal to use Japanese style day names.

Options for reading off the digits and symbols

- Grammatical sentences

This encourages awkward spoken mathematical notation.  It would be better to just read off the symbols as they appear.  Trying to shoe-horn mathematical notation into a constructed language using the language’s internal grammar is like saying phone number as “first there is an 9, then there is a 7, then there is a 9, etc.”  555-234-2344 is just fine without verbs, without trying to get it to follow any rules about agreement.

- Use Calques and descriptive words for symbols.

- Decimal places are circular things. But exponents are pretty abstract.   I’m thinking the odds of finding a short-transparent compound phrase are fairly low. More over, notation should be somewhat brief to read.  So this all favors choosing non-transparent words.  If there isn’t a good short word for factorial, might as well use “kala”.  At least “kala” won’t likely be misunderstood for “fish”.  Normally “tu” would be a good verb for divide, but because it also is a digit, it would be a lousy name for an operator.

Good properties of number systems for toki pona like languages

1) Don’t use loan words. If you’re going to use loan words, why not speak Esperanto?

2) Use decimal, unless you’re just doing a number system for show.

3) Don’t re-invent the wheel. Numbers are only kind of linguistic entities and mathematical notation is not linguistic at all. Don’t bother creating a syntax and grammar for reading off mathematical notation or invent a new mathematical notation.  Unless you are doing so for show.

Toki Pona’s Morphological Derivation Process

A free morpheme is one that can act as a stand alone word.

A compound word hard to detect because it is a semantic, not a syntactic phenomena.  You can observe someone utter blackboard, but you can’t easily tell if in their skulls they’re processing it as a board that is black, or a specific type of board that is black, namely one for writing on with chalk and might not even be black!  I suppose you could ask enough fluent speakers, but this isn’t really an available option with artificial langauges.

Toki pona has only free mophemes, except for possibly the “la”, “li”, “e” and maybe “ni”. (If those were used as free mophemes, it would be peculiar cases not representative of typcial speech)

Free morphemes will combine with other free morphemes to form new lexemes.

These multi-stem lexemes typically require memorization, but not always.  If they are somewhat obvious on first read then they are transparent. Multi-stem lexemes are interpreted by their narrow meaning first and by their generic meaning as an after thought.  In a language like toki pona which has accent on first syllable, the second free morpheme may have weakened or lost accent.   Punctuation and spelling tend to be of no help in identifying compound word, for example in English they may have no space, hyphens or a space between the stems.  Real spoken language tends to have no pauses between words, so that won’t help either.

Productive word creation through free morphemes is common to many languages.  Some languages like to include preposition-like particles, e.g. French chemin-de-fer  some don’t, English iron lung.

The toki pona multistem lexeme has this official form

[head noun] – [any number of modifiers] pi [new head noun]- [any number of modifiers] pi [etc]

The resulting multi-stem lexeme can be plugged into anywhere that a bare head noun is permissible.  AFAIK, the “chemin-de-fer”/”man-of-war” style lexeme is illegal, that is, you can’t use kepeken, tawa, lon, or other prepositions to build up a mulit-stem lexeme, but it appears that the “pi” particle can act as a neutral proposition that means “some relationship, including ‘of’, ‘in’, ‘by’, ‘for’, ‘with’”  Collapsing all relationships into “pi” actually leads to a loss of information.  A possible improvement would be

[head noun] – [any number of modifiers] pi [preposition] [new head noun]- [any number of modifiers] pi [etc]


[head noun] – [any number of modifiers] [preposition] [new head noun]- [any number of modifiers] pi [etc]
The pi [preposition] pattern would have the advantage of letting the user know that the multi-stem lexeme is continuing.
Typically the modifiers and subsequent are salient characteristics, i.e. they’re endocentric.  The modifiers convey that the item is of a special sort as indicated by the modifier.  However!  There are unexplored edge cases which it seems speakers will inevitably encounter, namely headless (or exocentric) multistem lexemes  and copulative multistem lexemes.
The former is where there isn’t a good word for the noun in question, but there are good modifiers for it.  A headless lexeme will feel like it needs “ijo” as the head noun, but “ijo” doesn’t always work.  Examples would be abstract entities.
The copulative lexeme is where the each morpheme is a head, e.g. bittersweet, sleepwalk.  Copulative lexemes can be recognized when you can’t decide which one goes first.

The verb has the same potentialities as the noun.

[head verb] – [any number of modifiers] pi [new head verb? noun?]- [any number of modifiers] pi [etc]

What these mean are considerable more difficult to say.  That all human verb patterns can be squished into categories of mood, tense, voice, etc. make me think that either these are linguistic universals or these are just the salient factors of moving and being that makes up reality.  You have to have the machinery to describe reality to say something interesting.

See Part 2… it’s somewhere, someday.