The 5 Word Langage

Finally, I sort of have the time to do a quick review of the 5 word language. I’m a big fan of small languages (as in small vocab, small numbers of rules, small number of morphemes)

This is really, 5 morphemes. I think all small languages, in practice have about 2000 lexemes (set phrases that behave as words that you just have to memorize). But no need to quibble about the number of morphemes, this looks like it really is 5 morphemes. That is small.

What is totally awesome about this, is it is 2 morphemes short of what you can memorize in a single view–short term memory is about 7 (plus or minus 2) items.

The vocab is laid out in a grid and the entire grid is used. This compares to toki pona which ignores the diagonal– i.e reduplication in tp doesn’t mean anything.

I think logotome is a real word… shoot me if it isn’t… and the logotome of a language is the set of all possible words (or lexemes) that the phonotactic system lets you create. Toki pona’s logotome is huge– even with the small alphabet and CV(CV)(N) structure, you have like 10000s of possible short words. A five word language has 25 two word phrases,

Good & Potential Applications
This might be a useful conlang creation technique. Create a dozen small 5 word conlangs, then turn the best of those into a large language. If the large language was compatible with the small one, then you’d have a conlang with a core that someone could learn before they lose interest, which I suspect is about 2 maybe 3 hours. Irrr

I like the idea of using numbers– it has an obvious application for text messaging on phones (there is a (failed) app for that! it’s an emoji type conlang). If I were to learn/create a dozen of these language though, I’d worry about having to remember that 42 means one thing in the 5 word language in another thing in the 7 word language. Already I constantly mix up telo, which in toki pona means water and in Russian means body. I read somewhere that when you hear a word that is the same in 3 languages (like, say, chocolate or tea) you the area of your brain in charge of that word for each language lights up. So cross language interference may be a real thing to consider when making small languages that draw on the same phonotactics (or logotactics– I don’t see if the 5 word language has a spoken format).

Domain specific languages. A domain specific language is an idea borrowed from software development– where you create a mini-language to deal with a specific topic. Then the language can be optimized for talking about that specific topic. It’s like an extreme version of slang and technical jargon, which can feel like a mini language. Except English jargon still will follow English grammar and syntax. In the lexicon, words are available for diabetes and God, so I figure this language makes it easier to talk about those topics. And if you want to talk about something else that doesn’t suit the language’s lexicon, create another 5 word language!

Areas for Improvement
Like many combinatorial languages (i.e. a fixed set of morphemes that are combined in all possible combinations), this particular description doesn’t say much about grammar. Is the grammar isolating? Do we have bound morphology? (i.e. do any of the morphemes only occur in a fix relationship to other morphemes, ie. do we have a tense suffix?) What are the basic sentence patterns? I.e. is it S-V-O, OVS, do we have prepositions, post positions, do we branch left (like japanese) right (like english) (Did I get that backwards?) or mixed branching (like toki pona). Do we have Part of Speech rules or are all words content words? Are some words “semantically bleached” and don’t mean much on their own, but mean something when in a sentence… examples from English– the, to, in, of, have, going (future), etc.

Posted in conlang | Leave a comment

Undeveloped Public Domain Conlangs

Barsoomian is public domain, unless it is the most recent movie version.

But here is one that I just notice: Parrot from Doctor Dolittle:

“Ka-ka-oi-ee, fee-fee”
“Is the porridge hot yet?”

Oh, boy. How shall we do an interlinear gloss for that?

I’m going to guess ka-ka-oi-ee is a compound word meaning porridge and fee-fee means ‘now’ and reduplication means a tag question. Since reduplication indicates something you don’t know, then ka-ka would be “mystery” as in mystery-meat. So “mystery-food, now-now?”

And the dog speaks a constructed sign language.

And I’m going to guess that after page two the author lost interested in actually describing the fake languages. A pity.

But hey! It’s public domain. You are allowed to fully develop the language and sell it. Go forth and do so!

Posted in Uncategorized | Comments Off

Esoteric Buddhism and Conlangs

I’m still reading about Esoteric Buddhism, so I’m no expert and may mis-speak. However, while reading about esoteric Buddhism, I got a bunch of fake linguistics ideas.

Esoteric vs Exoteric Readings
The exoteric reading is the way you’d read a text and simply understand it. No secret messages. But if each thing and action is actually a symbol of something else, then you get an esoteric, secret reading. It’s post hoc magical thinking to think that there really was an encoded message, but people didn’t dismiss the idea of esoteric readings, so was there any benefit to the action? I think so, I think it works as a creativity device. Take a text, imagine that there was a secret interpretation and find it.

I think some language and syntaxes are going to be more given to an esoteric reading. For example, in toki pona, most sentences follow the same form. So, as a contrived example, “Let’s go shopping” and “We attack at dawn” would still follow a subject-verb-DO-prep phrases pattern.

Polysemy and homonymy helps create intentional esoteric readings. So if a language really did define “Let’s=We” and “attack=go shopping” and “missing time marker = dawn” we’d have an intentional esoteric reading. Without such a huge amount of homonymy, esoteric readings would be accidental or occasional at best. But post-hoc estoteric readings would only be restricted to your imagination.

So an esoteric reading is an act of communication system construction, a new mapping of meaning onto an existing syntactic structure.

Mantras are magic formulas. Some are perfectly intelligible, usually invocations of powerful beings. Namo Amida Butsu! Hail the Amiddha Bodhisattva!

Some are nearly unintelligible, but when written in Chinese, the radicals do mean something. It’s like canard about the word “crisis” have two symbols that each on their own mean something that has a clever relationship to the compound word. In a dhirani (mantra w/o meaning), the Chinese is your substrate for an esoteric reading.

Again, this is an act of communication system creation, sort of like if you discovered an ancient text and by fiat decided it was a recipe for bread by matching up words with an English recipe. This of course would fall apart if you applied the trick to an additional undeciphered text.

Anyhow, what can you do with this? Maybe it’s a good practice for language creation, which often stalls at picking a phonetic inventory. Take a non-sense phrase, and do an esoteric reading. Your esoteric reading is sort of mini-conlang.

Posted in conlang design | Comments Off

Language learning materials

So I’m working on improving my Russian. These things drive me crazy about learning materials:

The pronoun pri means “during” (but also most of all other prepositions). And that counts and a definition. It is then followed by two, maybe three examples. And that is all. This works fine if you already know Russian and are looking to label and identify the rules you already know. Prepositions and cases from a learners standpoint are all chaos and unpredictability. I’d rather a length set of examples than some suspicious and long list of rules. Instrumental case– it’s the case to use for your profession (What the f*k?) It is also the case you use for certain “x and y” constructions. (What the f*k?) Dative case, it’s the case that you use in a sentence where you like things (What the f*k?) Just give me a length list of samples.

The genitive case is the case you answer with should someone ask “Kovo?” I asked my Mum who speaks not a word of Russian Kovo, over and over and she never gave me the genitive of any word. This rule works only if you are already fluent in Russian and need to be able to label this rule that you already know.

Single word repetitions. (You heard that word once, you’ve memorized it, right?) Should I fault books for not being flash card decks? I will. Why can’t they be creative and put words in, say a 8×8 grid, so that you can review them in pseduo-random order?

Single demonstrations. Okay, lets take an example outside of Russian– the Algonquian obviate. It takes a few pages to explain the obviate and at the end of that explanation you will be utterly confused about how the deep structure works. Or the superficial structure for that matter. You then get two example sentences. As a learner, I think I will need maybe 100 or more example sentences to illustrate a rule that no author can explain very well. An example from English would be dangling or stranded prepositions. (“There are some things I will not put up with!”– As a fluent English speaker, it sounds right, I don’t imagine even six pages of technical explanation would help a learner– but four or five hundred samples might help)

Charts. Here is a 1000×1000 chart of all the two word phrases in English. Memorize them. Each cell in a chart generally, for me, feels like a entirely separate fact/skill and their location in some grid is about as important as ordering them by number of letters, graphing them by which have the most straight lines vs curved lines, or other pretty but irrelevant details.

Posted in Learning Any Language | 3 Comments

Conlex- Here is what I hope it means

Conlex– a sort of activity for people creating new (or reviving dead/nearly dead) languages by actually speaking them into existence. It entails create a language, especially the materials necessary for learners and then doing what it takes to get a few other people to learn the language– maybe it means teaching it to the toddler, the girlfriend or promoting it for its attractive feature (who knows, maybe it’s pretty, maybe it’s because it’s useful for silent communication in the dark). It entails creating new culture only to the extent that one hopes and expects learners to adopt those new customs. This is conlanging where the conlanger is not a god-king to be worshiped (a la Zamenhoff), but a peasant and the learners are kings.

Despite the internet being a wide and vast place, for some topics, it seems rather small. In the area of non-natural languages, it is small. Relatively few participants and each non-natural language tends to carve off a new community as it is hard to gain competency in more than a few languages.

So let’s say you are reviving Beothuk. Let’s imagine that you did an exercise I promote, which is to write up your own manifesto about what a new language should be. (Yes, go ahead and send me nasty grams about how I am forcing, yes forcing you, to write your own manifesto instead of slavishly copying mine, or esperantos or lojbans or Tolkiens. The people that make you face your freedom are the worst.) At the end of the exercise you say, hmm, this language should be alive. People should actually use it. Maybe even three people. (And mentally, you can re-do my hypothetical with a diary-con-language to be taught to toddlers in the home, or a conlang that relies on touch boards for the profoundly disabled, etc)

(Yes, your manifesto might be different that my hypothetical, for example, if your manifesto is cribbed from Tolkien, well, it isn’t a conlex and that is a topic for someone else to write)

So you go to Conlang-L and watch people talk about the joys of writing tolkien style reference grammars. They have to be hard! Because that is the only way that they will get any respect I guess. But in my hypothetical, you want the language to be human usable, and you end up with a lot of not very helpful and sometimes angry advice about how the language needs to be hard, huge, copyrighted, hostile to learners and not to be sullied by actual use. But the 2 people that are likely to be interested in Beothuk have no interest in fictional sound-change-histories, they really just want an expressive, learner oriented language with lots of teaching materials.

And you visit the auxlang lists, who are creating yet another Esperanto. Which is a fine project, people should learn from it. But the manifesto in this hypothetical calls for a language based on the remnant words of the a lost people, so averaging the vocab of European languages and marking everything for part of speech is out! And 3 people is plenty, neo-Beothuk isn’t going to rule the world. So we keep moving.

And you visit more websites where people hawk languages they want to include a conculture that requires inflecting the pronouns and verbs for the social rank of your three parents, each of a different gender. Oh, joy, that is going to be fun to speak here on Earth. And again, you get not so helpful advice about how your language MUST have a conculture because language and culture are inseparable and any language you write must be dripping with culture or you’re doing it wrong and they will eventually get mad at you. Neo-Beothuk might include culture-light (say a new form of honorifics or Beothuk Day), but if we don’t want to tell those 3 Beothuk fans to bugger off, we’ll have to be constrained by what they want. If they don’t want body paint, asking for it will get in the way of launching a new language.

Anyhow, there is not a corner of the internet that serves this sort of thing at the moment, not LCS, not Conlang-L, heavens to betsy not Zompist. The Klingon, na’vi, and toki pona communities all seemed to burst into existence despite a place where people create languages for learner and indiscriminate language learners can meet.

I hope the equivalent of the “learn any language forum” for non-natural languages comes into existence and that a new learner and “fan-centric” approach comes with it.

Posted in conlang community building, conlang learning | 2 Comments

Fake Mantras and Fake Languages

In Hinduism, there was the idea that words said in a prestige language were magic. People at has some pre-Sassurian ideas about sound and meaning, namely that there was something doggy about the sounds d-o-g and something catty about c-a-t. Since then, this has been proven nonsense, or at best, words of similar meaning can group together in how they sound. I don’t have the examples handy, sorry. And of course, there is real morphology, where antidisestablishmentarian has a bunch of parts that mean something, but on the other hand, carpet doesn’t have parts, but it looks like it.

Back to India. They imagined there were seed syllables, the syllables mean something (as if car in carpet really meant car!) The sounds were typically vowels and liquids, less typically any sort of consonant that completely blocks the passage of air. “kit cat” would be a lousy mantra. Try to chant it… it doesn’t roll of the tongue. But “lily” does. By this reasoning, you can have obstruent consonants at the beginning of a mantra, but not the middle– nothing that blocks the breath.

So fast forward to now. Meditation is popular. We often do secular meditation using numbers, 1, 2, 3, 4, 5,… 10, repeat. If you go over 10, you know your mind has wandered. Or in traditional chanting, aka noisy meditation, we chant something, usually 2000 year old Sanskrit phrases that are untranslatable nonsense, agrammatical strings of themes (bija syllables with some sort of symbolic meaning) or possibly bad Sanskrit made up by someone who didn’t actually read or write it (Mantra of Light, I’m looking at you) Some of them are names of god-like Bodhisatvas. I find it endlessly distracting that I’m chanting the name of an imaginary superman.

Another thing that happens with mantras is massive streamlining. Namu Amida Butsu turns into Nembutsu because people are trying to say it 10 times in one breath. Or to say it 100s of time and they want to finish on time to go to work. Which brings up another point, matras act a sort of linguistic clock. If you want to meditate for 20 minutes but don’t have a clock, you can chant x times and on average hit 20 minutes. (sort of like, 1 Mississippi, 2 Mississippi, etc)

English Mantras
Nonsense – “Ya ba da ba do da!” “Hi ho, hi ho, it’s off to work I go!”
Traditional Translations- “Homage to the Amida Bodhisatva” (bleh, unchantable)
Modern innovations- “love and peace…love and peace” (or “love and peace and brownies…”)

The down side of a mantra you understand is that you might get distract by the content of the mantra.

Toki Pona Mantras
Assigning meanings to any percent of the possible syllables would create the possible problem of creating words. So if pon means good, and lon means the universe, no new meanings. If tila means “compassion”, oops, we’ve coined a new word, albeit one only for mantras. Grammar also posses a challenge. In toki pona, all utterances are supposed to be grammatical, else you aren’t doing toki pona. But a Hindu style mantra, might be something like:

pon(a) lon pilin pon(a)…etc.

And that isn’t grammatical. So it’s a community innovation, which may or may not bother you.

Grammatical toki pona mantras would be something like

o jan Puta Amita o tawa e mi tawa ma pona sina!

Anyhow, toki pona mantras will sound better if you drop the final vowels and or n, it will add more vibrations. This actually is a legit toki pona maneuver. Toki pona phonetics were designed to make it easy for anyone to say it, so transformations of the language are legal. For example, you could still express toki pona with all the l’s pronounced as r’s, all the k’s pronounced as g’s etc.

Other Conlang Mantras
One idea is to use articulation symbolism– assign symbolic meaning to each part of the tongue and mouth & construct magic words that have a nice mixture of symbols.

Post written from feedback on facebook conlang group, facebook toki pona group, and toki pona forum.

Posted in conlang design, toki pona | Comments Off

Comparatives and languages that “don’t have them”

Some American Indian languages “don’t have” comparatives. All American Indian languages are small community languages (well, were) and those typically are mindbendingly complicated. There are so many mechanisms for expression that some familiar mechanism in European languages might be missing. A perfect example is a language with purportedly no marking for subject and object. Simplistic, right? Not really, the language marks words for so many things, that by the end of the sentence, you probably have figure out who hit who. (See that? who hit who– now which is the recipient of the action there?)

Okay, now what if a means for comparatives is missing– for example, a simple way to express where to things are on a scale.

You could say, “The cat is small. The dog is large” If you are used to languages with comparatives, this doesn’t feel the same. And worse, you don’t have 101 other things going on in the sentence to nail down the details of the story of the cat who’s size was on the lower end of the scale compared to the dog’s size.

Now lets take a context free sentence, do a louse translation in to toki pona and then speculate on what a real community would do if they had to do comparatives (and without new words and without morphology):

The colorless green ideas slept furiously, but the mobility fountain ran even more to the blenderward hermeneutic. That is why your tools should be more blue than my cloth is red.

sona pi kule ala li lape pi pilin utala- taso tawa pi ilo telo musi li tawa noka mute sama wan sama pilin nasa pi lipu mute. tan ni la o jo e ilo laso kin sama ala len loje mi pi kin ala.

If our life depended on comparatives being communicated rapidly and accurate, I imagine we would coin a syntactic innovation on the spot, it doesn’t matter if it is elegant or not, say…. X li quality kin li sama Y lili. X is more quality than Y. (and many other possiblitiies)– the key is that the structure is consistent– there is nothing about X is more Y than Z that requires it to mean comparitive except someone in the dim history of English started it.

And there would be no way to see why something liek X li quality kin… (etc) should mean that except that it was all that the colocutor could think of at the moment. With repetition, it would become part of the grammar.

Posted in toki pona | Comments Off

A Buddhist conlang idea

So a big idea in Buddhism is that you can analyze anything but break it down into parts in a way that you cannot say, ah ha! This is the Toyota! Instead you just get a pile of car parts and a pile of metal particles, atoms, etc. Everything is just one great big conceptual liquid goo with convenient boundaries we project onto the universe. In Buddhism, the gist is that not understanding this about ourselves leads to unhappiness. So I got the idea for a Buddhist language.

First imagine a verb template, they look like this:

time prefix – verb stem – person suffix – locative suffix (upward/downward/left or right)

Okay, so that is a verbal template. Now imagine that the template is allowed to repeat sections, have multiple verb stems and incorporate nouns.

subject – object – conjunction- subject- object – time prefix – verb stem – person suffix – locative suffix (upward/downward/left or right)

Okay, now imagine the word template has no start and no end. You always start all utterances mid word and eventually you just stop midword and stare into the distance.

If I were to follow through, somewhat unlikely, I’d probably spend more time on what linguists currently think counts as a word and what is a morpheme. You know ‘em when you see ‘em.

Posted in conlang design | Comments Off

So it appears the main barrier between me and Russian fluency is…

that I haven’t been told that google translate is a somewhat unreliable translation.

I’ve posted a few question on the Russian StackExchange. I get two things– pretty good answers from experts and cheap potshots from the commenters. The experts sound like the might have experience teaching a foreign language. The commenters are worse than useless and the moderators side with them.

Beginning language learners begin writing by a variety of means, I know I’ve tried.

Initially, you use what ever form of word you remember. “polu” — it means floor, what gender, what case, what number? Not sure, you hear this in the phrase, it fell on the floor. So you use the word as if it has only one form. A non teacher will try to teach you that each word has a half dozen forms. I already know that. Thanks, the sole thing between me and fluency was that I didn’t already know that Russian has morphology.

Then you have words you have never heard, so you look them up. Then people criticize you for using the dictionary form too often. Thanks again, the sole thing between me and Russian fluency is that I didn’t know that Russian has morphology.

So I try google translate. I can gauge the quality of the translation by looking at how well it does from Russian to English and I can generally see what is certainly wrong and what is dodgy. But that doesn’t mean I know how to write it better. So I go ask a question and all I get is “Oh! Don’t use google translate, it is worthless” (And instead they suggested I use just google! Whee, now I have a word used in a completely different context, wrong case, wrong number and you need to know what word to search for to google it.) If you are fluent bilingual and learned English from Mom and Russian from Dad and can decide the case of a noun by asking, “What word answers to Kovo?”– well, for you google translate is useless.

For you dictionary, google translate and learner-haters out there– I’m happy you are smart and bilingual, but I wish you just bugger off. You have an idealized idea of how to learn a language where everyone does it just the same way you did, they memorized all 600 pages of Dr. Smiths Grammar, you can rattle off all 2000 slots in the tables of word endings and do so in Cyrillic and English alphabetical order and you’ve already memorized the dictionary entries of the entire dictionary and can recited any page on demand just by someone naming the page number.

Good for you. Leave me alone.

And as for the Russian StackExchange, I will leave you alone, the same as it seems the rest of the internet is doing. The Master Russian forum (the main competitor in this space) lets people post translation requests– people post what google translate thought and they get… translations & help. (I’ll update later if I’m wrong about the general level of civility and acceptance of how language learners really are on the Master Russian forum.)

Posted in Learning Any Language | Comments Off


Someone said, “3 people are fluent in toki pona” What does that even mean? It means squat.

The gold standard of fluency is native fluency, which kids get for free. It is not so free that a language learned as a child will be excellent– for example, if you learn Spanish in the US from your grandmother, you will not speak it as well as someone who learned it in Mexico and took 12 years of schooling in Spanish.

There are about 5 cases of kids becoming fluent in a conlang, ghostlang or other similar language that previously was spoken by no one and is now spoken by them and their mom or dad. One for Living Latin(1), Hebrew(many), Esperanto (many), Klingon (1), Volapuk(1) and a lady on FB who is teaching her diary language to her kids. In the cases where there was only one– obviously they stopped speaking it as adults and probably don’t speak it all that well now. Still, these cases are what linguists prefer when looking for fluent speakers and if warmed up, they might speak it better than anyone who knows that language only as an L2.

Near native fluency is fairly hard to gain as an adult– it happens, but usually there will be accents and grammatical peculiarities in the speech of someone who has learned a language as a 2nd language, even if they are crazy smart. Typically it takes 10 years of living in a country, speaking your 2nd language all the time before you hit near native fluency.

Below that, is a huge range of degrees of fluency. I can converse in Russian, but can’t write it to save my life. I can read Icelandic, often better than I can read Russian. I can pick out a few words here and there in French and Spanish and my translation score would be better than a machine that chose words at random [secret: that's how google translate works;-)]. There are people who can write or read just fine, so long as they have a reference grammar and lots and lots of time. There are people who can spit out a stead stream of words as fast as you like and are generally intelligible, but they make grammatical errors and have a thick accent. And so on.

When I used to organize study groups, ideally everyone is at a similar fluency level. In practice, it turned out that there are 100s of distinct levels of fluency– people that know 100 words, or 1000 words or 10000 words– each case is an entirely different situation. This is why people who each speak English as an second language understand each other better than if they have to speak to a native English speaker. If everyone is drawing on the same 5000 words and same few dozen grammatical constructions, communication is simple and fluid.

People who have never studied a foreign language, nor a conlang, have no clue about any of this. So they hear that toki pona has 3 fluent speakers and they don’t realize that that factoid is bullshit. It is accurate to say that there are zero native speakers, zero near-native speakers, there are 50 to 100 people who have ever written a paragraph or more of toki pona and probably about 10 or 20 who can do it without warm up *right now* (everyone else would probably have to review and re-remember it all).

Also, another important point is that conlangs are not fully defined. You can learn all there is possible to learn about toki pona (and with more effort) about Klingon. And you’ll hit a wall, after which there are statements whose grammaticallity can’t really be judged for lack of native speakers, except maybe to get a ruling from the creator or relevant “language board”.

**NB technical note, there is at least one theory about Tok Pisin– that it was a language of mostly adults and developed it’s unique grammar among adults and then kids copied their parents. This compares with the other story that adults speaking new languages (creoles and conlangs) are in fact mostly using their L1 grammar and L2 vocabulary while the natively fluent children speak with the vocab and grammar of the new language.

Posted in conlang, conlang learning | 1 Comment

Syntax Coloring and Highlighting and Autocompletion

I wish there was syntax highlighting for English. When it is there, you see errors faster. I like autocompletion too, where you type a word and get a list of possible next words, sort of like what cell phone keyboard do.

Branching Direction.
If I type “the” the next word could be a long list. If this were Icelandic, if I start a noun, I have a short list of ways to end it (some including “the”). So I think in general, right branching will lead to a decision tree where you pick a word from the infinite possibilities and the next word is somewhat predictable. Another example, if I say very… the next word could be anything. But if the order was reversed and I said “hot” the next word is a short list of possibilities, probably including “very”

Take the example of toki pona, there are only 125 or so words, so we should be able to predict what is next, but it has mixed right branching, purely left branching would be better.

noun + modifier => maybe 50 choices are most likely.
start of sentence + pronoun => it’s going to be only a few possible things.
Conditionals are backwards, then are tagged at the end of a phrase. So a text editor wouldn’t know a conditional is starting. Vocatives are backwards, being tagged at the end (jan o!). But imperatives are right branching (o moku!)

Part of Speech Systems vs Content/Function word systems
The distinction here is obvious to me. Think of Esperanto. There is a very strict system of words being adjectives, nouns, adverbs, verbs and to convert one to the other you must change the word ending. In English (and toki pona), the system is more like content words which can be converted among any part of speed depending on place in sentence and function words, which glue phrases together and generally resist changing into nouns, verbs or adjectives. Function words would include things like “a”, “the”, “for”, “will” and so on.

When there is a strong part of speech system, a parse can see that a word is, say, a noun and infer that some sort of adjective is next. Also, with a Content/Function word system, you can’t tell the part of speech until you parse the sentence. In English and toki pona, there is often more than one way to parse a sentence, so a word an be one part of speech or another depending on how you want to understand it.

Syntactic Ambiguity
I’m no expert but I already suspect that Lojban doesn’t deliver on ambiguity-free sentences. I suspect that all sentence can have 2 parses– the one people mean and the one they said. And there can be more meaning semantically. Two people read a syntactically unambiguous sentence, they deserialize it to a data structure in their brain and the gety

Posted in conlang design | Comments Off

An unimplemented idea for conlang phonotactics

Phonotactics- the recipes for building new words.

So someone created a fake language. But they died, or got a real job or otherwise abandoned it. How to move it forward if they didn’t document the phonotactics?

There are many word generators and they mostly use very small domain specific languages (DSLs). For example

Valid Patterns: CV, CVCV, CCVV, VCV

And usually there are additional rules to reduce the total number of rules, e.g. (C)VCV means VCV can optionally start with a consonant.

Now the values of C, V and “Valid Patterns” are all sort of simple. So why not generate rule sets at random and then score how often they are able to account for the existing words? And to further optimize the algorithm, mutate the best sets or genetically cross them (take half of the rules of each highly performing rule set and check to see how suitable a new merged ruleset is)

This would allow for providing a list of sample words, generating a rule set and then generating a list of potential new words.

What this won’t do: it won’t account for things like in CVCV, the two vowels will be similar to each other because people have lazy tongues, so the vowels sometimes become similar or identical. But with enough computations, defects like these might become unimportant.

Posted in machine assisted conlanging | 1 Comment

Kickstarter and fake langauges

Finishing a conlang is a lot of work. So can fans provide an incentive to new language creators to finish it, say by pledging $ in return for a variety of prizes to be given to the fans?

Possible rewards:
The foundational documents in paper or ebook format- dictionary, reference grammar, canonical corpus.
Educational materials. Graded reader, workbooks, flash cards.
Educational services, like tutoring, classes, in person or online.
Artwork. (Posters with script)
Symbolic gestures– all the things that are not *really* related to fake languages, such as “your name on the list of contributors, a tote bag with the language name or logo.
Inclusion in the creation process: e.g. your name will be come an epoynm and it will mean “chunk style”
…. something else.

Possible challenges
A reference grammar isn’t too exciting for fans. Fans, for the most part, are there for the community.
Scale- If you get 2 sales, that’s better than zero, but doesn’t cover your fix costs of doing anything. If you get 2 million sales, you probably don’t have access to the loans, staff and what not you need to actually fulfill what ever you were promising. There is some sweet spot for sales, above or below that and this kickstarter is just a headache.
Not actually providing any motivation. A successful kickstarter might promise $9000– enough to motivate someone to ship a few hundred copies of an already written book, not sure if it is enough to motivate me (as a language creator) to write a reference grammar, dictionary, invest the time to become competent in a new language, etc. This would lead to fans being upset about unmet promises.

Oh. Intrinsic rewards. I’m reading a book right now that implies that if money gets involved in an activity you used to like intrinsically, they you become less motivated, especially when the money goes away. I.e. a lot of people do this conlang thing for free. If we paid them, they would probably get a burst of energy while being paid, but after the money goes away, they would be less likely to continue to work on it. I wonder if this has any application to movie languages– did Okrand, Peterson or Frommer work less on their languages when between movies? The guy that did Loglan worked on it till he died– he hoped to make money on it, but AFAIK, never did. Maybe that accounts for the lifetime, intrinsic motivation.

Posted in Uncategorized | Comments Off

Bresenish. It should work like a programming language

So I’m continuing to think about the syntax of bresenish, but in the back ground I’m reading a lot about programing languages. I’m thinking that there should be a way to speak a programming language. Except it would execute in your mind, not on a machine. And it wouldn’t be neurolinguistic programming, but hey, if it inspires someone to write a NLP sci-fi book with less handwaving, so much the better.

Okay, syntax.

Originally I thought I could get by with just elements, sets and relations (i.e. operators) because a simple sentence looks like that “Jack and Jill ran up the hill”=”The set of the elements jack, jill is related to the set of the element hill, via the relationship of running up” But my first attempt to translate the tower of babel into gloss showed that I needed assignment and variables. I need something that works like a pronoun, but has set theoretic typing (the set, the element) and works like a pronoun (“The set of men working on building a tower, which I shall henceforth refer to as the A team”) And I needed to express distinctly equivalence & assignment. “There is a set of men who worked all night, we’ll call them the B team. The A team and the B team are the same.” So I assign that list of men to the B team. And then I declare that A and B are the same, i.e. I have two different ways to describe them. Another example: “There is a pretty girl. There is a young girl. There is a school girl. These elements are all the same element”

Bresenish would be imperative. So it would work like this:

Imagine a stage. Put on that stage these elements, jane, jack, a base ball bat. The bat moves repeatedly to jacks head. The bat is in the hands of jane.

Imagine a stage. Put on that stage these elements, jack, a cake. Imagine another stage with these elements: jack. The former stage became the later stage.

More on the consequences of variables. Variables work a bit like proper nouns and pronouns. They mean something specific, anyone can make one up, they can last a long time or a short time. Like pronouns, they refer to something else. Natural pronouns tend to be of a small number rely on a “type”, e.g. person, number, social rank, animacy. In a set inspired language, there would be a need for a lot of different variables and having too many “its” would be a pain. Having too many ad hoc words would be a pain too. Maybe if there were some conventional variable names, like the way programmers often use “foo” “bar” “baz” as common variables, or mathematicians with x, y, z.

Bresenish on a machine…
I don’t actually have the skills to write this, but having worked with programming languages I think I can imagine what is currently possible.

An executable Bresenish would start out with a fixed number of elements, relations, attributes (adjectives). And those would be the elements of possible discourse. You could create as many variables as you’d like, but all those variables would refer back to already defined elements. In human instructions, we elide the obvious. So in a machine oriented language, we’d probably be very verbose and need loops. Human oriented would probably use an attribute on a relation to indicate an action was repeated.

So something like

Imagine a stage. (Computer draws a stage). On the stage are these elements (list elements, draw those on state. If abstract, just list them on side bar). The elements are related this way (locative elements would be the easiest to illustrate, abstract ones would just be represented with arrows, i.e. A loves B). The elements, X, Y, Z are a set, call ‘em foo. They have the attribute of “red”. (Computer draws circle around X, Y, Z and paints them red).

And so on. I think with some effort, Harry Potter could be translated into such a thing and a computer could hypothetically draw the resulting movie.

So to re-cap:

Stage = A universe (that you hold in your mind) being described. This is what you are imperatively telling your listener to manipulate.
Elements = vocab, both concrete and abstract
Sets = Things that relate to each other, might only have 1 element
Relations = Roughly verbs. They relate sets.
Attributes = Roughly adverbs and adjectives, modifiers of relations and elements.
Variables = Temporary names for an element or set, either drawn from a conventional list or made up on the spot.
Assignment = Saying an unbound variables in bound to another one. An lo, so it is.
Equivalence = Declaring two things are referring to the same thing. A machine, which is keep track of all the elements and the relations defined so far, would merge the two views of a stage after hearing that they are equivalent. “Stage one- The man is running” “Stage two- The man is sweating” “Stage one == Stage two” (The listener merges these two views and now we have a sweating man, running on the stage)

And I think that is enough moving parts for a workable grammar. Now I just have to work out a morphology and my list of elements. Which I think will still be Icelandic sounding words.

Posted in Bresenish | Comments Off

Bresenish. This is That.

So another post on a language I’ve provisionally called Bresenish. The goal is to creates something inspired by set theory and computer programming languages.

So one book I read said language is kind of like describing a stage where a play is happening. There are actors, elements on the stage, things happen across a time line. We then use words to turn all of this into descrete models of reality. These models are serialized into a linear string of sounds. Someone hears it and frickin’ magic happens and they deserialize it back into a representation of reality in their brain. Unlike lojban that tries to solve the problem of deserializing the sound back into an intermediate model, I’m content to leave that up to frickin’ magic.

So far I have the idea of the stage being described by a set of typed elements (the items on the stage, each of which can be part of a type, such a animate, inanimate, blue, not blue, what ever), a set of relationships (sees, is-sitting, is-tired), and a set of compound pronouns for disabiguation, such as, it-him, they-her. To parse such a sentence, you’d imagine all the possible relationships (the cartesian product, which is all possible links) and then filter down the list using the compound pronouns.

Seems like a good idea, but I have a hard time thinking of occasions when there is more than two elements on the stage, more than one relationship and thus only one compound pronoun. This maps down to just S-V-O, some subject is define in relationship to some object. An awkward English gloss might be, “Cat, Dog”, “Sees”, “him-him” The cat and the dog see each other (or maybe just the dog sees the cat or vice-versa)

Also, I needed another structure for dealing with equality, such as “The animal in the other room is a cat”. So I got the idea of adding a bunch of imperatives.

“Image there is a stage. On the stage is a barrel, some fish, a smoking gun.”

Now I’d like to say that this is the set up for a joke. Or it’s a recipe. Or it’s a crime scene.

“Imagine another stage. On the stage is a recipe. These stages are the same”

In pseudo code, it would look like

var stage1 = [barrel, fish, smoking gun]; //I’m telling you to imagine that these exist, they are on stage.
var stage2 = [recipe]; //I’m telling you to image that this exists. This time, it’s an abstraction you need to imagine.
assertEqual(stage1,stage2); //Where I said, this is true and you can challenge me if it isn’t.

I can’t think of any human language that uses an open class of arbitrary as proper modifier names as pronouns. But why not, it’s convenient. If I’d chosen the names “this” and “that” then it would be pretty readable.

Another problem I ran into was how to do set operations. Set operations need to be able to yield a new set, and then discourse needs to be able to refer to that new set without repeating the recipe.

var stage1 = [All the barbers who shave.]
var set2= stage1 – [barbers who shave themselves]
var stage2 = [All independent people] intersect set2
var stage3 = [poor people]
assertEqual(stage2, stage3)

Posted in Bresenish | 1 Comment