Puctuating toki pona- Community Proposal

toki pona has a few constraints, without which the game of toki pona is rendered silly. Foremost is that there are only about 125 or so morphemes (fairly non-bound). This isn’t as much of a problem and I think tp community proposals can stick to this.

The next idea is that numbers, dates and so on are lacking, as if this were the language of an ancient tribe (despite missing a fully formed system of naming plants, animals and extended family relations). This is problematic for working with data on computers. Numbers and Dates are basic types, without them certain computer experiments are harder than necessary.

I’m writing a parser and I need to make a few modifications to make tp easily parsable.

1) Phrasal compounds are joined with dashes. jan-pona. jan-pi-sijelo-pona.
2) Prefix numbers with #, e.g. #wan If it is a two word number, it is hyphenatied, e.g. #wan-tu
3) Direct quotes are in << >>. e.g. jan li toki e << toki! >>
4) Prepositional phrase must start with , e.g. mi li, lon ma ni. jan li moku, kepeken ilo.
5) Non toki pona text is escaped with double quotes. mi toki kepeken toki “English”

We have compound words. We pretend we don’t, but we do. These are lexemes, phrasal compound words. Compound words are joined by -’s

jan-pona = friend.
jan-pi-sijelo-pona = doctor.

Why? Because you can’t accurately machine gloss jan pona to friend. Why should we pretend that jan-pona is anything but a phrasal compound and gloss it as good person, healthy person, friend, etc. Without hyphens, I have to gloss using a list of alternatives. With hyphens, I can dispense with a list of alternatives and home in on a single gloss.

jan li ike li tawa jan pi sijelo pona li kama jan pona.

jan li ike li tawa jan-pi-sijelo-pona li kama jan pona.

We have “rovers”/syntactical infix. I don’t know what these are really called.

jan-mute-pi-sijelo-pona = doctors.
jan-pi-sijelo-pona-mute = doctors.

We need numbers. The shall be words prefixed by #


I will have to look up 3,4,6,7,8,9 from the forum. I know there are many proposals, I’ll look for community ones and then I plan to implement the ones that are base 10, don’t introduce new words, positional and reasonably efficient, e.g. no worse than English in expressing large numbers.

Some numbers are legacy numbers with some degree of officialness and will have to be supported.

#wan-tu-tu = 4
#luka-luka = 10
#MMLW = 20+20+5+1

But I don’t recommend using legacy numbers if you are trying to communicate.

Watch this space!

We need direct quotes. They shall be wrapped in << >> (or the « » if you can find those keys on the keyboard)

jan li toki e << mi jo e soweli! >>
He said, “I have a dog.”

I hope I don’t regret this choice because < and > mean something in HTML and might cause problems in some content management systems. Oh well.

Anything in direct quotes markers is syntactically a content word.

We need commas.
People currently add commas before or after la, but actually we don’t need them there. I have no opinion about what people do there. Also I have no opinion about commas in pi-phrases.

mi pali, kepeken ilo sona, lon tomo, pali tawa mani.
I work with computers in the office for money.

When there is nothing to distinguish a preposition from a content word, it is valid to parse every word after pali as a string of adverbs:

mi pali(kepeken ilo sona lon tomo pali tawa mani).

Humans can realize that is unlikely, but a machine can’t. Humans can parse invalid toki pona and realize that someone is mixing Russian and English and toki pona rules and, with some effort, realize the intended correct toki pona. This sort of parsing is a huge effort to implement. On the other hand, commas make parsing mechanically effortless.

We need an escape character
The corpus texts are full of mixed language material, from accidents in transliteration to people just trying to communicate. After transliterating to toki pona, normally the original is unrecognizable– it might as well be a completely new word. So toki pona texts that interact with the real world, will need to have foreign text. And that text should be in double quotes.

nimi mi li “Matthew Martin” li jan Mato.

Anything in double quotes syntactically is a content word.

The current date system is something like

tenpo suno wan, mun wan, sike suno wan = 1/1/1

You can find some variant of this on the wikia for toki pona. It uses legacy numbers and is to cumbersome for anyone to want to use it.

I’m going to recommend this format: y-m-d
Because it will be easier to sort.

Also, for this to work, numbers have to be reasonably efficient and be able to cope with numbers from 1 to 2015.

Watch this space!

Posted in machine assisted conlanging, toki pona | Leave a comment

Robot Languages

By the way, it looks like Dothraki has a published spec. Now on to other topics.

As someone with the facial expressions of a robot, I’ve always been partial to robots and some of my earliest attempts at programming were to create chat bots and AI. I failed, of course. But now I have some ideas on how to make it work.

Our human brains have some sort of knowledge representation system, it turns our stage– the world around us, into facts represented by neurons linked by axons and dendrites, which chatter using neurochemicals. We lack a technology to accurately and usefully use a neurological model to represent reality. But, hey, we got other ways to represent reality. For example, we use documents and relational databases to keep track of inventory and the business activities of all large businesses and government in the world.

Normally, when this need to be communicated, we use protocols like HTTP to send (often technology independent) serializations of database records that can be sent across a wire. We then using UI’s and binding to turn this into human consumable materials.

But lets get back to robots. Robots are machines that would want to be like people, and thus use a natural language. That means they could possibly deal with people directly. But English is hard, so maybe a conlang or restricted version of English would be better.

Representations of reality:
Name – Phone Number
Joe – 555-1234
Jane – 444 – 5678

If this was toki pona, we could serialize this as:
nanpa pi jan Joe li 555 – 1234.

By some complicated system of equalities, we could work out that this is the same as:
jan Joe li jo e nanpa ni: 555 -1234

If the robot heard a sentence, it would attempt to use deserialization & equality checks to transform the utterance into a known data type:

jan Mato li jo e nanpa ni: 111 -8989 ==> Mato – 111 -8989

A lojban style processor could also answer utility questions like,

nanpa Jane li 444-4678 la ona li toki tawa mi.
If Jane’s number is 444-4678, then she’s talking to me.

And the robot would respond, after binding & processing pronouns:
jan Jane li toki tawa sina.
Indeed, Jane is talking to you.

Or utility questions might involve common computer tricks like, “how many digits are in Jane’s phone number? What is the sum of the digits in Jane’s phone number?” A human actually excels at this arbitrary discussion, where as a robot has to be programmed for each exchange of that sort.

Pronouns seem like something that would be really, really hard for a computer. If my computer only had a knowledge representation system for the phone book, it would need to know who is a person, who is capable of having a phone number and so on. People excel at common sense, modern code doesn’t. Databases rely on nonce, unique names and variables that might be bound to anything are used only in limited scopes to make sure that they do only bind to 1 thing at a time.

Next, is the chat bot problem.

Chat bots respond to what ever you ask. Usually it’s modeled as a command. But human languages only sometimes use commands.

If Jane’s number is X, then she’s talking to me. (Implied, asking for confirmation)
I know Jane. (Implied, asking for additional information about Jane, e.g. Oh, you do? I know her too, her number is X)

Another thing a chat bot should be able to do, is serialize things into something that is suitable for saying over the phone. Most code dumps text to the screen, often in a grid format. A good robot would be able to tell a story in a way that takes into account attention span. A bad robot would read all 5000 phone numbers. A smart robot would say, after reading two, “and so on” or “do you want me to keep going or are you looking for someone in particular”

State– some of the best chat bots are sadly stateless. They don’t incorporate anything you say into their base of knowledge. Some do, but it’s kind of wonky– they just remember that after saying “Good day” people usually just repeat “Good day”.

A good robot takes all utterances and converts them into a system of knowledge.

My phone book robot, if I said:

mi jo e soweli.

Would interpret that as asking the database to create a new table like so:

who – inventory
jan Mato – soweli

And if two minutes later I asked:

mi jo e seme?

The robot should be able to look it up even though 10 minutes ago, this robot only knew phone numbers.

This is the flip side of serialization– turning language back into the knowledge representations system.

Anyhow, this has been done before, MS SQL had a natural English processor, it was probably similar to what I have described, although I bet it only dealt with turning english into SELECT statements and turning the tables of data, maybe into English sentences. Turning English into tables that can be queried again is probably hard.

A tp fact database would rely heavily on equality tests:

mi jo e soweli lon tomo mi.
Does this factually contain the following?
mi jo e soweli. Yes.

Anyhow, hopefully personal life will allow the free time to write such a thing. So to recap:

Knowledge representation system: E.g. relational tables.
Serialization system: E.g. turns rows and tables into sentences
Deserialization system: Creates tables and binds utterances to a table, then inserts 1 or more rows.
Persistence: All commands, factual or otherwise, become part of the system of knowledge.
Query language: Questions, or statements that prompt retrieving information and serializing it back to the interlocutor.
Utility: Processing tasks that are not really related to retrieving and updating a representation of knowledge. For example, answering if at least 3 people in the phone book have names starting with “G”
Equality and Transformations. Natural languages can serialize into many equivalent forms.

Posted in machine assisted conlanging | Leave a comment

The 5 Word Langage

Finally, I sort of have the time to do a quick review of the 5 word language. I’m a big fan of small languages (as in small vocab, small numbers of rules, small number of morphemes)

This is really, 5 morphemes. I think all small languages, in practice have about 2000 lexemes (set phrases that behave as words that you just have to memorize). But no need to quibble about the number of morphemes, this looks like it really is 5 morphemes. That is small.

What is totally awesome about this, is it is 2 morphemes short of what you can memorize in a single view–short term memory is about 7 (plus or minus 2) items.

The vocab is laid out in a grid and the entire grid is used. This compares to toki pona which ignores the diagonal– i.e reduplication in tp doesn’t mean anything.

I think logotome is a real word… shoot me if it isn’t… and the logotome of a language is the set of all possible words (or lexemes) that the phonotactic system lets you create. Toki pona’s logotome is huge– even with the small alphabet and CV(CV)(N) structure, you have like 10000s of possible short words. A five word language has 25 two word phrases,

Good & Potential Applications
This might be a useful conlang creation technique. Create a dozen small 5 word conlangs, then turn the best of those into a large language. If the large language was compatible with the small one, then you’d have a conlang with a core that someone could learn before they lose interest, which I suspect is about 2 maybe 3 hours. Irrr

I like the idea of using numbers– it has an obvious application for text messaging on phones (there is a (failed) app for that! it’s an emoji type conlang). If I were to learn/create a dozen of these language though, I’d worry about having to remember that 42 means one thing in the 5 word language in another thing in the 7 word language. Already I constantly mix up telo, which in toki pona means water and in Russian means body. I read somewhere that when you hear a word that is the same in 3 languages (like, say, chocolate or tea) you the area of your brain in charge of that word for each language lights up. So cross language interference may be a real thing to consider when making small languages that draw on the same phonotactics (or logotactics– I don’t see if the 5 word language has a spoken format).

Domain specific languages. A domain specific language is an idea borrowed from software development– where you create a mini-language to deal with a specific topic. Then the language can be optimized for talking about that specific topic. It’s like an extreme version of slang and technical jargon, which can feel like a mini language. Except English jargon still will follow English grammar and syntax. In the lexicon, words are available for diabetes and God, so I figure this language makes it easier to talk about those topics. And if you want to talk about something else that doesn’t suit the language’s lexicon, create another 5 word language!

Areas for Improvement
Like many combinatorial languages (i.e. a fixed set of morphemes that are combined in all possible combinations), this particular description doesn’t say much about grammar. Is the grammar isolating? Do we have bound morphology? (i.e. do any of the morphemes only occur in a fix relationship to other morphemes, ie. do we have a tense suffix?) What are the basic sentence patterns? I.e. is it S-V-O, OVS, do we have prepositions, post positions, do we branch left (like japanese) right (like english) (Did I get that backwards?) or mixed branching (like toki pona). Do we have Part of Speech rules or are all words content words? Are some words “semantically bleached” and don’t mean much on their own, but mean something when in a sentence… examples from English– the, to, in, of, have, going (future), etc.

Posted in conlang | 1 Comment

Undeveloped Public Domain Conlangs

Barsoomian is public domain, unless it is the most recent movie version.

But here is one that I just notice: Parrot from Doctor Dolittle:

“Ka-ka-oi-ee, fee-fee”
“Is the porridge hot yet?”

Oh, boy. How shall we do an interlinear gloss for that?

I’m going to guess ka-ka-oi-ee is a compound word meaning porridge and fee-fee means ‘now’ and reduplication means a tag question. Since reduplication indicates something you don’t know, then ka-ka would be “mystery” as in mystery-meat. So “mystery-food, now-now?”

And the dog speaks a constructed sign language.

And I’m going to guess that after page two the author lost interested in actually describing the fake languages. A pity.

But hey! It’s public domain. You are allowed to fully develop the language and sell it. Go forth and do so!

Posted in Uncategorized | Comments Off

Esoteric Buddhism and Conlangs

I’m still reading about Esoteric Buddhism, so I’m no expert and may mis-speak. However, while reading about esoteric Buddhism, I got a bunch of fake linguistics ideas.

Esoteric vs Exoteric Readings
The exoteric reading is the way you’d read a text and simply understand it. No secret messages. But if each thing and action is actually a symbol of something else, then you get an esoteric, secret reading. It’s post hoc magical thinking to think that there really was an encoded message, but people didn’t dismiss the idea of esoteric readings, so was there any benefit to the action? I think so, I think it works as a creativity device. Take a text, imagine that there was a secret interpretation and find it.

I think some language and syntaxes are going to be more given to an esoteric reading. For example, in toki pona, most sentences follow the same form. So, as a contrived example, “Let’s go shopping” and “We attack at dawn” would still follow a subject-verb-DO-prep phrases pattern.

Polysemy and homonymy helps create intentional esoteric readings. So if a language really did define “Let’s=We” and “attack=go shopping” and “missing time marker = dawn” we’d have an intentional esoteric reading. Without such a huge amount of homonymy, esoteric readings would be accidental or occasional at best. But post-hoc estoteric readings would only be restricted to your imagination.

So an esoteric reading is an act of communication system construction, a new mapping of meaning onto an existing syntactic structure.

Mantras are magic formulas. Some are perfectly intelligible, usually invocations of powerful beings. Namo Amida Butsu! Hail the Amiddha Bodhisattva!

Some are nearly unintelligible, but when written in Chinese, the radicals do mean something. It’s like canard about the word “crisis” have two symbols that each on their own mean something that has a clever relationship to the compound word. In a dhirani (mantra w/o meaning), the Chinese is your substrate for an esoteric reading.

Again, this is an act of communication system creation, sort of like if you discovered an ancient text and by fiat decided it was a recipe for bread by matching up words with an English recipe. This of course would fall apart if you applied the trick to an additional undeciphered text.

Anyhow, what can you do with this? Maybe it’s a good practice for language creation, which often stalls at picking a phonetic inventory. Take a non-sense phrase, and do an esoteric reading. Your esoteric reading is sort of mini-conlang.

Posted in conlang design | Comments Off

Language learning materials

So I’m working on improving my Russian. These things drive me crazy about learning materials:

The pronoun pri means “during” (but also most of all other prepositions). And that counts and a definition. It is then followed by two, maybe three examples. And that is all. This works fine if you already know Russian and are looking to label and identify the rules you already know. Prepositions and cases from a learners standpoint are all chaos and unpredictability. I’d rather a length set of examples than some suspicious and long list of rules. Instrumental case– it’s the case to use for your profession (What the f*k?) It is also the case you use for certain “x and y” constructions. (What the f*k?) Dative case, it’s the case that you use in a sentence where you like things (What the f*k?) Just give me a length list of samples.

The genitive case is the case you answer with should someone ask “Kovo?” I asked my Mum who speaks not a word of Russian Kovo, over and over and she never gave me the genitive of any word. This rule works only if you are already fluent in Russian and need to be able to label this rule that you already know.

Single word repetitions. (You heard that word once, you’ve memorized it, right?) Should I fault books for not being flash card decks? I will. Why can’t they be creative and put words in, say a 8×8 grid, so that you can review them in pseduo-random order?

Single demonstrations. Okay, lets take an example outside of Russian– the Algonquian obviate. It takes a few pages to explain the obviate and at the end of that explanation you will be utterly confused about how the deep structure works. Or the superficial structure for that matter. You then get two example sentences. As a learner, I think I will need maybe 100 or more example sentences to illustrate a rule that no author can explain very well. An example from English would be dangling or stranded prepositions. (“There are some things I will not put up with!”– As a fluent English speaker, it sounds right, I don’t imagine even six pages of technical explanation would help a learner– but four or five hundred samples might help)

Charts. Here is a 1000×1000 chart of all the two word phrases in English. Memorize them. Each cell in a chart generally, for me, feels like a entirely separate fact/skill and their location in some grid is about as important as ordering them by number of letters, graphing them by which have the most straight lines vs curved lines, or other pretty but irrelevant details.

Posted in Learning Any Language | 4 Comments

Conlex- Here is what I hope it means

Conlex– a sort of activity for people creating new (or reviving dead/nearly dead) languages by actually speaking them into existence. It entails create a language, especially the materials necessary for learners and then doing what it takes to get a few other people to learn the language– maybe it means teaching it to the toddler, the girlfriend or promoting it for its attractive feature (who knows, maybe it’s pretty, maybe it’s because it’s useful for silent communication in the dark). It entails creating new culture only to the extent that one hopes and expects learners to adopt those new customs. This is conlanging where the conlanger is not a god-king to be worshiped (a la Zamenhoff), but a peasant and the learners are kings.

Despite the internet being a wide and vast place, for some topics, it seems rather small. In the area of non-natural languages, it is small. Relatively few participants and each non-natural language tends to carve off a new community as it is hard to gain competency in more than a few languages.

So let’s say you are reviving Beothuk. Let’s imagine that you did an exercise I promote, which is to write up your own manifesto about what a new language should be. (Yes, go ahead and send me nasty grams about how I am forcing, yes forcing you, to write your own manifesto instead of slavishly copying mine, or esperantos or lojbans or Tolkiens. The people that make you face your freedom are the worst.) At the end of the exercise you say, hmm, this language should be alive. People should actually use it. Maybe even three people. (And mentally, you can re-do my hypothetical with a diary-con-language to be taught to toddlers in the home, or a conlang that relies on touch boards for the profoundly disabled, etc)

(Yes, your manifesto might be different that my hypothetical, for example, if your manifesto is cribbed from Tolkien, well, it isn’t a conlex and that is a topic for someone else to write)

So you go to Conlang-L and watch people talk about the joys of writing tolkien style reference grammars. They have to be hard! Because that is the only way that they will get any respect I guess. But in my hypothetical, you want the language to be human usable, and you end up with a lot of not very helpful and sometimes angry advice about how the language needs to be hard, huge, copyrighted, hostile to learners and not to be sullied by actual use. But the 2 people that are likely to be interested in Beothuk have no interest in fictional sound-change-histories, they really just want an expressive, learner oriented language with lots of teaching materials.

And you visit the auxlang lists, who are creating yet another Esperanto. Which is a fine project, people should learn from it. But the manifesto in this hypothetical calls for a language based on the remnant words of the a lost people, so averaging the vocab of European languages and marking everything for part of speech is out! And 3 people is plenty, neo-Beothuk isn’t going to rule the world. So we keep moving.

And you visit more websites where people hawk languages they want to include a conculture that requires inflecting the pronouns and verbs for the social rank of your three parents, each of a different gender. Oh, joy, that is going to be fun to speak here on Earth. And again, you get not so helpful advice about how your language MUST have a conculture because language and culture are inseparable and any language you write must be dripping with culture or you’re doing it wrong and they will eventually get mad at you. Neo-Beothuk might include culture-light (say a new form of honorifics or Beothuk Day), but if we don’t want to tell those 3 Beothuk fans to bugger off, we’ll have to be constrained by what they want. If they don’t want body paint, asking for it will get in the way of launching a new language.

Anyhow, there is not a corner of the internet that serves this sort of thing at the moment, not LCS, not Conlang-L, heavens to betsy not Zompist. The Klingon, na’vi, and toki pona communities all seemed to burst into existence despite a place where people create languages for learner and indiscriminate language learners can meet.

I hope the equivalent of the “learn any language forum” for non-natural languages comes into existence and that a new learner and “fan-centric” approach comes with it.

Posted in conlang community building, conlang learning | 2 Comments

Fake Mantras and Fake Languages

In Hinduism, there was the idea that words said in a prestige language were magic. People at has some pre-Sassurian ideas about sound and meaning, namely that there was something doggy about the sounds d-o-g and something catty about c-a-t. Since then, this has been proven nonsense, or at best, words of similar meaning can group together in how they sound. I don’t have the examples handy, sorry. And of course, there is real morphology, where antidisestablishmentarian has a bunch of parts that mean something, but on the other hand, carpet doesn’t have parts, but it looks like it.

Back to India. They imagined there were seed syllables, the syllables mean something (as if car in carpet really meant car!) The sounds were typically vowels and liquids, less typically any sort of consonant that completely blocks the passage of air. “kit cat” would be a lousy mantra. Try to chant it… it doesn’t roll of the tongue. But “lily” does. By this reasoning, you can have obstruent consonants at the beginning of a mantra, but not the middle– nothing that blocks the breath.

So fast forward to now. Meditation is popular. We often do secular meditation using numbers, 1, 2, 3, 4, 5,… 10, repeat. If you go over 10, you know your mind has wandered. Or in traditional chanting, aka noisy meditation, we chant something, usually 2000 year old Sanskrit phrases that are untranslatable nonsense, agrammatical strings of themes (bija syllables with some sort of symbolic meaning) or possibly bad Sanskrit made up by someone who didn’t actually read or write it (Mantra of Light, I’m looking at you) Some of them are names of god-like Bodhisatvas. I find it endlessly distracting that I’m chanting the name of an imaginary superman.

Another thing that happens with mantras is massive streamlining. Namu Amida Butsu turns into Nembutsu because people are trying to say it 10 times in one breath. Or to say it 100s of time and they want to finish on time to go to work. Which brings up another point, matras act a sort of linguistic clock. If you want to meditate for 20 minutes but don’t have a clock, you can chant x times and on average hit 20 minutes. (sort of like, 1 Mississippi, 2 Mississippi, etc)

English Mantras
Nonsense – “Ya ba da ba do da!” “Hi ho, hi ho, it’s off to work I go!”
Traditional Translations- “Homage to the Amida Bodhisatva” (bleh, unchantable)
Modern innovations- “love and peace…love and peace” (or “love and peace and brownies…”)

The down side of a mantra you understand is that you might get distract by the content of the mantra.

Toki Pona Mantras
Assigning meanings to any percent of the possible syllables would create the possible problem of creating words. So if pon means good, and lon means the universe, no new meanings. If tila means “compassion”, oops, we’ve coined a new word, albeit one only for mantras. Grammar also posses a challenge. In toki pona, all utterances are supposed to be grammatical, else you aren’t doing toki pona. But a Hindu style mantra, might be something like:

pon(a) lon pilin pon(a)…etc.

And that isn’t grammatical. So it’s a community innovation, which may or may not bother you.

Grammatical toki pona mantras would be something like

o jan Puta Amita o tawa e mi tawa ma pona sina!

Anyhow, toki pona mantras will sound better if you drop the final vowels and or n, it will add more vibrations. This actually is a legit toki pona maneuver. Toki pona phonetics were designed to make it easy for anyone to say it, so transformations of the language are legal. For example, you could still express toki pona with all the l’s pronounced as r’s, all the k’s pronounced as g’s etc.

Other Conlang Mantras
One idea is to use articulation symbolism– assign symbolic meaning to each part of the tongue and mouth & construct magic words that have a nice mixture of symbols.

Post written from feedback on facebook conlang group, facebook toki pona group, and toki pona forum.

Posted in conlang design, toki pona | Comments Off

Comparatives and languages that “don’t have them”

Some American Indian languages “don’t have” comparatives. All American Indian languages are small community languages (well, were) and those typically are mindbendingly complicated. There are so many mechanisms for expression that some familiar mechanism in European languages might be missing. A perfect example is a language with purportedly no marking for subject and object. Simplistic, right? Not really, the language marks words for so many things, that by the end of the sentence, you probably have figure out who hit who. (See that? who hit who– now which is the recipient of the action there?)

Okay, now what if a means for comparatives is missing– for example, a simple way to express where to things are on a scale.

You could say, “The cat is small. The dog is large” If you are used to languages with comparatives, this doesn’t feel the same. And worse, you don’t have 101 other things going on in the sentence to nail down the details of the story of the cat who’s size was on the lower end of the scale compared to the dog’s size.

Now lets take a context free sentence, do a louse translation in to toki pona and then speculate on what a real community would do if they had to do comparatives (and without new words and without morphology):

The colorless green ideas slept furiously, but the mobility fountain ran even more to the blenderward hermeneutic. That is why your tools should be more blue than my cloth is red.

sona pi kule ala li lape pi pilin utala- taso tawa pi ilo telo musi li tawa noka mute sama wan sama pilin nasa pi lipu mute. tan ni la o jo e ilo laso kin sama ala len loje mi pi kin ala.

If our life depended on comparatives being communicated rapidly and accurate, I imagine we would coin a syntactic innovation on the spot, it doesn’t matter if it is elegant or not, say…. X li quality kin li sama Y lili. X is more quality than Y. (and many other possiblitiies)– the key is that the structure is consistent– there is nothing about X is more Y than Z that requires it to mean comparitive except someone in the dim history of English started it.

And there would be no way to see why something liek X li quality kin… (etc) should mean that except that it was all that the colocutor could think of at the moment. With repetition, it would become part of the grammar.

Posted in toki pona | Comments Off

A Buddhist conlang idea

So a big idea in Buddhism is that you can analyze anything but break it down into parts in a way that you cannot say, ah ha! This is the Toyota! Instead you just get a pile of car parts and a pile of metal particles, atoms, etc. Everything is just one great big conceptual liquid goo with convenient boundaries we project onto the universe. In Buddhism, the gist is that not understanding this about ourselves leads to unhappiness. So I got the idea for a Buddhist language.

First imagine a verb template, they look like this:

time prefix – verb stem – person suffix – locative suffix (upward/downward/left or right)

Okay, so that is a verbal template. Now imagine that the template is allowed to repeat sections, have multiple verb stems and incorporate nouns.

subject – object – conjunction- subject- object – time prefix – verb stem – person suffix – locative suffix (upward/downward/left or right)

Okay, now imagine the word template has no start and no end. You always start all utterances mid word and eventually you just stop midword and stare into the distance.

If I were to follow through, somewhat unlikely, I’d probably spend more time on what linguists currently think counts as a word and what is a morpheme. You know ‘em when you see ‘em.

Posted in conlang design | Comments Off

So it appears the main barrier between me and Russian fluency is…

that I haven’t been told that google translate is a somewhat unreliable translation.

I’ve posted a few question on the Russian StackExchange. I get two things– pretty good answers from experts and cheap potshots from the commenters. The experts sound like the might have experience teaching a foreign language. The commenters are worse than useless and the moderators side with them.

Beginning language learners begin writing by a variety of means, I know I’ve tried.

Initially, you use what ever form of word you remember. “polu” — it means floor, what gender, what case, what number? Not sure, you hear this in the phrase, it fell on the floor. So you use the word as if it has only one form. A non teacher will try to teach you that each word has a half dozen forms. I already know that. Thanks, the sole thing between me and fluency was that I didn’t already know that Russian has morphology.

Then you have words you have never heard, so you look them up. Then people criticize you for using the dictionary form too often. Thanks again, the sole thing between me and Russian fluency is that I didn’t know that Russian has morphology.

So I try google translate. I can gauge the quality of the translation by looking at how well it does from Russian to English and I can generally see what is certainly wrong and what is dodgy. But that doesn’t mean I know how to write it better. So I go ask a question and all I get is “Oh! Don’t use google translate, it is worthless” (And instead they suggested I use just google! Whee, now I have a word used in a completely different context, wrong case, wrong number and you need to know what word to search for to google it.) If you are fluent bilingual and learned English from Mom and Russian from Dad and can decide the case of a noun by asking, “What word answers to Kovo?”– well, for you google translate is useless.

For you dictionary, google translate and learner-haters out there– I’m happy you are smart and bilingual, but I wish you just bugger off. You have an idealized idea of how to learn a language where everyone does it just the same way you did, they memorized all 600 pages of Dr. Smiths Grammar, you can rattle off all 2000 slots in the tables of word endings and do so in Cyrillic and English alphabetical order and you’ve already memorized the dictionary entries of the entire dictionary and can recited any page on demand just by someone naming the page number.

Good for you. Leave me alone.

And as for the Russian StackExchange, I will leave you alone, the same as it seems the rest of the internet is doing. The Master Russian forum (the main competitor in this space) lets people post translation requests– people post what google translate thought and they get… translations & help. (I’ll update later if I’m wrong about the general level of civility and acceptance of how language learners really are on the Master Russian forum.)

Posted in Learning Any Language | Comments Off


Someone said, “3 people are fluent in toki pona” What does that even mean? It means squat.

The gold standard of fluency is native fluency, which kids get for free. It is not so free that a language learned as a child will be excellent– for example, if you learn Spanish in the US from your grandmother, you will not speak it as well as someone who learned it in Mexico and took 12 years of schooling in Spanish.

There are about 5 cases of kids becoming fluent in a conlang, ghostlang or other similar language that previously was spoken by no one and is now spoken by them and their mom or dad. One for Living Latin(1), Hebrew(many), Esperanto (many), Klingon (1), Volapuk(1) and a lady on FB who is teaching her diary language to her kids. In the cases where there was only one– obviously they stopped speaking it as adults and probably don’t speak it all that well now. Still, these cases are what linguists prefer when looking for fluent speakers and if warmed up, they might speak it better than anyone who knows that language only as an L2.

Near native fluency is fairly hard to gain as an adult– it happens, but usually there will be accents and grammatical peculiarities in the speech of someone who has learned a language as a 2nd language, even if they are crazy smart. Typically it takes 10 years of living in a country, speaking your 2nd language all the time before you hit near native fluency.

Below that, is a huge range of degrees of fluency. I can converse in Russian, but can’t write it to save my life. I can read Icelandic, often better than I can read Russian. I can pick out a few words here and there in French and Spanish and my translation score would be better than a machine that chose words at random [secret: that's how google translate works;-)]. There are people who can write or read just fine, so long as they have a reference grammar and lots and lots of time. There are people who can spit out a stead stream of words as fast as you like and are generally intelligible, but they make grammatical errors and have a thick accent. And so on.

When I used to organize study groups, ideally everyone is at a similar fluency level. In practice, it turned out that there are 100s of distinct levels of fluency– people that know 100 words, or 1000 words or 10000 words– each case is an entirely different situation. This is why people who each speak English as an second language understand each other better than if they have to speak to a native English speaker. If everyone is drawing on the same 5000 words and same few dozen grammatical constructions, communication is simple and fluid.

People who have never studied a foreign language, nor a conlang, have no clue about any of this. So they hear that toki pona has 3 fluent speakers and they don’t realize that that factoid is bullshit. It is accurate to say that there are zero native speakers, zero near-native speakers, there are 50 to 100 people who have ever written a paragraph or more of toki pona and probably about 10 or 20 who can do it without warm up *right now* (everyone else would probably have to review and re-remember it all).

Also, another important point is that conlangs are not fully defined. You can learn all there is possible to learn about toki pona (and with more effort) about Klingon. And you’ll hit a wall, after which there are statements whose grammaticallity can’t really be judged for lack of native speakers, except maybe to get a ruling from the creator or relevant “language board”.

**NB technical note, there is at least one theory about Tok Pisin– that it was a language of mostly adults and developed it’s unique grammar among adults and then kids copied their parents. This compares with the other story that adults speaking new languages (creoles and conlangs) are in fact mostly using their L1 grammar and L2 vocabulary while the natively fluent children speak with the vocab and grammar of the new language.

Posted in conlang, conlang learning | 1 Comment

Syntax Coloring and Highlighting and Autocompletion

I wish there was syntax highlighting for English. When it is there, you see errors faster. I like autocompletion too, where you type a word and get a list of possible next words, sort of like what cell phone keyboard do.

Branching Direction.
If I type “the” the next word could be a long list. If this were Icelandic, if I start a noun, I have a short list of ways to end it (some including “the”). So I think in general, right branching will lead to a decision tree where you pick a word from the infinite possibilities and the next word is somewhat predictable. Another example, if I say very… the next word could be anything. But if the order was reversed and I said “hot” the next word is a short list of possibilities, probably including “very”

Take the example of toki pona, there are only 125 or so words, so we should be able to predict what is next, but it has mixed right branching, purely left branching would be better.

noun + modifier => maybe 50 choices are most likely.
start of sentence + pronoun => it’s going to be only a few possible things.
Conditionals are backwards, then are tagged at the end of a phrase. So a text editor wouldn’t know a conditional is starting. Vocatives are backwards, being tagged at the end (jan o!). But imperatives are right branching (o moku!)

Part of Speech Systems vs Content/Function word systems
The distinction here is obvious to me. Think of Esperanto. There is a very strict system of words being adjectives, nouns, adverbs, verbs and to convert one to the other you must change the word ending. In English (and toki pona), the system is more like content words which can be converted among any part of speed depending on place in sentence and function words, which glue phrases together and generally resist changing into nouns, verbs or adjectives. Function words would include things like “a”, “the”, “for”, “will” and so on.

When there is a strong part of speech system, a parse can see that a word is, say, a noun and infer that some sort of adjective is next. Also, with a Content/Function word system, you can’t tell the part of speech until you parse the sentence. In English and toki pona, there is often more than one way to parse a sentence, so a word an be one part of speech or another depending on how you want to understand it.

Syntactic Ambiguity
I’m no expert but I already suspect that Lojban doesn’t deliver on ambiguity-free sentences. I suspect that all sentence can have 2 parses– the one people mean and the one they said. And there can be more meaning semantically. Two people read a syntactically unambiguous sentence, they deserialize it to a data structure in their brain and the gety

Posted in conlang design | Comments Off

An unimplemented idea for conlang phonotactics

Phonotactics- the recipes for building new words.

So someone created a fake language. But they died, or got a real job or otherwise abandoned it. How to move it forward if they didn’t document the phonotactics?

There are many word generators and they mostly use very small domain specific languages (DSLs). For example

Valid Patterns: CV, CVCV, CCVV, VCV

And usually there are additional rules to reduce the total number of rules, e.g. (C)VCV means VCV can optionally start with a consonant.

Now the values of C, V and “Valid Patterns” are all sort of simple. So why not generate rule sets at random and then score how often they are able to account for the existing words? And to further optimize the algorithm, mutate the best sets or genetically cross them (take half of the rules of each highly performing rule set and check to see how suitable a new merged ruleset is)

This would allow for providing a list of sample words, generating a rule set and then generating a list of potential new words.

What this won’t do: it won’t account for things like in CVCV, the two vowels will be similar to each other because people have lazy tongues, so the vowels sometimes become similar or identical. But with enough computations, defects like these might become unimportant.

Posted in machine assisted conlanging | 1 Comment

Kickstarter and fake langauges

Finishing a conlang is a lot of work. So can fans provide an incentive to new language creators to finish it, say by pledging $ in return for a variety of prizes to be given to the fans?

Possible rewards:
The foundational documents in paper or ebook format- dictionary, reference grammar, canonical corpus.
Educational materials. Graded reader, workbooks, flash cards.
Educational services, like tutoring, classes, in person or online.
Artwork. (Posters with script)
Symbolic gestures– all the things that are not *really* related to fake languages, such as “your name on the list of contributors, a tote bag with the language name or logo.
Inclusion in the creation process: e.g. your name will be come an epoynm and it will mean “chunk style”
…. something else.

Possible challenges
A reference grammar isn’t too exciting for fans. Fans, for the most part, are there for the community.
Scale- If you get 2 sales, that’s better than zero, but doesn’t cover your fix costs of doing anything. If you get 2 million sales, you probably don’t have access to the loans, staff and what not you need to actually fulfill what ever you were promising. There is some sweet spot for sales, above or below that and this kickstarter is just a headache.
Not actually providing any motivation. A successful kickstarter might promise $9000– enough to motivate someone to ship a few hundred copies of an already written book, not sure if it is enough to motivate me (as a language creator) to write a reference grammar, dictionary, invest the time to become competent in a new language, etc. This would lead to fans being upset about unmet promises.

Oh. Intrinsic rewards. I’m reading a book right now that implies that if money gets involved in an activity you used to like intrinsically, they you become less motivated, especially when the money goes away. I.e. a lot of people do this conlang thing for free. If we paid them, they would probably get a burst of energy while being paid, but after the money goes away, they would be less likely to continue to work on it. I wonder if this has any application to movie languages– did Okrand, Peterson or Frommer work less on their languages when between movies? The guy that did Loglan worked on it till he died– he hoped to make money on it, but AFAIK, never did. Maybe that accounts for the lifetime, intrinsic motivation.

Posted in Uncategorized | Comments Off