So English has “goodbye” which most people would say is one word. But it used to be “god be with you” and somewhere in between then and now and phrase turned into a single word. Professional linguists use a variety of tests to check to see if a word is a word or a phrase made of words, for example, is the meaning opaque & not recoverable from inspecting the parts, do you have to memorize it, does it resist or prohibit being re-arranged, does it take an plural, e.g. “Say your goodbyes” vs “? Say your god be with yous”
(Similar process going on with Icelandic “gerðu svo vel” and Swedish “varsågod/var så god”, which AFAIK, used to be the same phrase. In Icelandic, it is still sort of like a phrase, in Swedish, it is not a compound word any more, it is just a word with no internal syntactic structures at all)
So toki pona has a lot of words are various degrees of opaqueness. If you don’t memorize “jan pona” you won’t necessarily guess it means friend. Also like English black bird, blackbird, the compound word refers to something more specific than just an black bird, but refers to a specific species.
The phenomena of “word building” using syntactically correct phrases exists, linguists study it, but probably not as much as they should because it is marginal in English, e.g. willow-o-the-wisp, chemin-de-fer. Notice that these real life compound words are written with hyphens.
But we tell ourselves a lie to keep the number of words to 120, rather than being a bit more honest that to competently read toki pona, you need to memorize one or two thousand compound words.
We have all heard some accidental racist say “That savage tribe doesn’t have a word for numbers” (because they are so stoopid, unlike myself). Toki pona has a running joke that toki pona doesn’t have numbers. But this ignores the fact that anyone that understands math can think up a dozen ways to communicate exact numbers using just about any language, including one intentionally crippled to make it hard to do so. Toki pona designed crippled with only three root words for exact numbers, ala, wan, tu, plus some other root words suggestive of numeric, quantitative or logical things, like mute, suli, ale. Toki pona builds up new lexical units using compound words, e.g. tu wan, by assigning new meanings to existing words, e.g. luka became 5, by analogy one can create a full body-part based county system. So for a language to *really* lack numbers, it *really* needs stupid speakers. All languages have the means to innovate and create new ways to express things.
What really is toki pona? Well, it doesn’t have a governance structure, like programming languages. It doesn’t have a standard implementation because languages run on brains. Without a standards board, what toki pona is how the individuals use it. Corpus linguistics is sort of a weird sort of evidence– something might never been seen in the corpus of observed utterances, but people might feel that a never seen before plural is okay, or a certain error might be common but people are pretty sure it is an error. And then there is the great mass of data in between– words, phrases and constructions that appear in the corpus over and over. Those phrases really are the language, even if they violate the so-called official grammars. The official grammar is something of a small lie, it says “This grammar is the language”, but really the language is the corpus of all utterance ever heard. Our brain translates that input into a tangle of brain cells. That tangle of brain cells is different for each person, and is capable of creating new toki pona sentences with a strong family resemblance to the initial corpus. At the moment, we can’t observe or make sense of the specific configurations of the neural networks of each toki pona speaker to infer what the typical grammar *really* is. We do write formal grammar, and these formal grammars are a small lie because the real grammar is too unwieldly to commit to paper.
Examples- people make decisions about order of adjectives, these orders are sometimes strict, sometimes loosey goosey and these rules are just about never written into a formal grammar. Ordering of phrases (time-manner-place) is another area where the corpus has strong opinions, but a formal grammar won’t necessarily have strong opinions. (Or it might have a strong opinion, people are formalizing the hard to formalize parts of languages all the time, but it was written by a PhD in Linguistics and only he and his adviser really understand it. Turning a neural network into a formal language is no easy task for some corners of grammar)
What to do? If one is making a new fake language, you should be thinking of making:
1) A corpus- preferably written by many people. It is rare to get many people to learn a fake language and write in it tho.
2) A formal grammar- it is handy for dealing with all the issues that a formal grammar deals well with.
3) Lots of examples showing the “correct”/”incorrect” ways of saying things. (Corpus linguistics, but with contrived texts with constructions explicitly marked as correct, as opposed to regular corpus texts that at best are “on the average probably correct for most people”