Math and logic subsystems in languages

So I can open a novel, and as part of being a human raised in an English speaking community, I pretty much understand everything. I can open a textbook on Calculus on Logic and while I can read the whole thing in English– there is even awkward but grammatically correct ways to read off the formulae– I’m not going to understand it just because I know English. I think this is some pretty conservative evidence that math and logic are not really natural languages, they are more like a foreign language embedded into a natural language.

So I was trying to deal with conjunction in toki pona. Sometimes they are made unnecessary by the “chain pattern”– one similar structure after another implies “and”. Sometimes they indicate discourse connectors, by tagging a sentence with “or” or “but”. Those two forms of logic are effortless to parse (except when people ignore the chain pattern and try to explicitly add “and” words) Finally we get these monsters:

1) jan li suli. (simple, no “and”)
2) jan li suli li laso. (chain patter, one right after another implies “and”)
3) jan li suli en laso. (different structures imply different meaning, maybe the qualities are mixed, like blue and red can be mixed)
4) jan li suli taso mute.
5) jan li suli anu mute.
6) jan li jan suli en mute.
7) jan li jan suli anu mute.
8) jan li jan pi suli en mute.
9) jan li suli en mute anu soweli. (mixed and, or, but)

3, 4, and 5 imply that you can “and”/”or”/”but” qualities without a head, so modifier phrases would be something like:

To parse all of the above, 1-8, you need a data structure that looks like this… and will lead to some monstrous maximal forms.

Head modifier (optional)
(Maybe a pi, depends on if you are predicate or modifier of a headed phrase)
Ands: en + modifiers — repeated
Ors: anu + modifiers — repeated
buts: taso + modifiers — repeated

And maximally something like:

jan li suli en mute taso lili taso laso anu soweli anu waso. (Grouped)
jan li taso lili en suli anu soweli en mute taso laso anu waso. (Jumbled up.)

How to parse this? I have no idea, it reads like a logic puzzle and you’d have to introduce a foreign logic system to do something with it. It looks syntactically valid. So I’m thinking my parser should represent a modifier chain as above, but make no claims about what it means. So it parses one way, and if someone (ha! unlikely) ever decided to implement a logic subsystem, they could take this parse and then transform it into all the possible meanings, truth tables and so on.

But for these applications, we don’t care:

grammar check– it’s valid syntax.
glossing– It glosses to English, and is equally ambiguous and unintelligible in English.
syntax highlighting– you only need to recognize an “and”/”or”/”but” sequence to color the text, you don’t need to know what it means or parse it as just one parse tree.
chat bot– A chat bot would never explore these corners of possible meaning in the universe of representable meanings that toki pona can represent.

Other Observations.

1) * jan li kepeken ilo en kepeken soweli. (Don’t use can to combine prep phrases)
2) */? jan li tawa en kama. (Don’t use en when you can use li– but if this was a modifier chain, and a predicate sentence, then its probably okay)
3) * jan li kepeken ilo anu kepeken soweli. (Don’t “or” prep phrases)
4) * jan li moku e ilo anu e soweli. Don’t use both anu and e, don’t use both taso and e [Update, changed to moku because kepeken has had some recent POS confusion from toki pona version pu)
5) */? ante jan li kepeken e ilo. Don’t use anything but anu or taso as a tag-conjunction.
6) * en jan li kepeken ilo. Don’t start sentence with en. (En is implied, although it would have made for a nice audible sentence demarcation)
7) ? waso pi laso en pimeja li pona tawa mi. This is really hard to parse. “and”ing modifiers in the subject slot is only sometimes distinguishable from mistakes and “and”ing subjects.

Posted in machine assisted conlanging, toki pona | 2 Comments

Observations While Editing Toki Pona

1) People create incorrect proper modifiers constantly.
2) People give up trying to use proper modifiers and just use their native language for all proper modifiers.
3) li get dropped for 3rd person subjects.
4) People essentially invent new constructions all the time. (It’s easier to enforce a ban on new words than it is to enforce a ban on new grammatical constructuions)
5) The complexities of parsing punctuation, ellipsis, fragments, asides/parentheticals, diglossia, direct speech, numbers, “theater script text, e.g. person1: “…” person2:”…” can be as great as the complexities of the core grammar (sentence, subject, object, modifier, etc) When there is no guidance, people just make stuff up.
6) People will do diglossia and mix in dates, times, numbers from a foreign language when the language spec doesn’t include them (or discourages using them!)
7) No one wants to punctuate poetry, which makes finding sentence termination incredibly difficult.

Posted in machine assisted conlanging, toki pona | Leave a comment

Abusing the word morphology as a pragmatic matter

So I have a syntax that relies on sentence templates like this:

[subject] li [verb phrase] li [verb phrase] [Prep Phrase] [Prep Phrase].

Phrase are made of words and phrases have their slots. Then come along negation, emphasis, and uncertainty (with a desire for clarification, also known as questions). These can be slot in in many places, which makes them feel like modifiers (adjectives and adverbs). But they can be several words (e.g. anu seme, x ala x) and they resist taking modifiers themselves, e.g. the following:

? jan li wile kin pi suli mute
? jan li wile ala pi suli mute
? jan li wile ala mute wile

I’m going to treat these as morphology. They have a different scope (or different type of scoping rules)– usually one word. (They can have a larger scope). And they can be inserted (almost) anywhere, more places than adjectives. Negation has an uncertain scope, but usually one word.

jan ala pi pali mute. Scope of 1 word
jan pi pali ala mute. Scope of 1 word.
jan pi pali mute ala. Ambiguous scope, but probably read as scoped to mute.

? jan ala pi pali ala mute ala. Slots in everywhere and means something in each slot.
jan laso pi pali laso mute laso. laso doesn’t actually imply anything different in the 2nd half. The heavy work load is still blue.

The ability to put ala into many place in a modifier chain imply that ala behaves like morphology– it works on a single word, not on a phrase.

jan pona kin.
jan ala pona.
jan pona ala.
jan pali ala
jan pali kin. Emphasis on pali. (Or on jan pali)
jan kin pali. Emphasis on jan. (And not on pali.)

But

jan suli walo == jan walo suli. Order didn’t matter. (assuming that jan suli and wan walo don’t have special meanings the way jan pona does.

Anyhow, not sure how to state it, but these feel like they should be modeled as a feature of a single word. (And maybe also as a feature of phrases).

I think this differs from, say a phrasal modifier, e.g. pi telo nasa, in that pi telo nasa modifies a certain phrases of a certain sort and individual words, while ala can modify just about any word, even propositions, verbs, modals. kin likewise can. And X anu X can be applied to any word in the verb phrase, maybe to prepositions.

They also mess up the template, because now the template as a lot of slots.

[subject(ala/kin/anu seme)] li [verb phrase(ala/kin/anu seme)] li [verb phrase(ala/kin/anu seme)] [Prep Phrase(ala/kin/anu seme)] [Prep Phrase(ala/kin/anu seme)].

If fits into so many slots it better to think of it as a way of transforming a word. So we get something that looks like morphology in a largely analytic language. And this should make the code for the parser easier to write (rather than adding numerous ala/kin/anu seme flag each phrase)

Posted in toki pona | 1 Comment

toki pona minutia

These are all unsettled, controversial ideas in toki pona.

1) kule lon palisa li pona tawa mi.

2) kule pi lon palisa li pona tawa mi.

Prepositional phrases as modifiers. These are most obvious when they appear in the subject. When they appear at the end of a sentence, then you could always argue that the phrase modifies the whole sentence or what ever the reader thinks makes sense. Almost no one does 2). I plan to write the parser to treat prepositional phrases as exchangeable with any tail, single word modifier (i.e. they are modifiers that come last)

1) ni li lon insa poka.

2) ni li lon insa poka suli.

3) ni li lon insa pi poka suli.

4) jan li tawa en tan ma suli.

Complex prepositions. People act like they can be joined, and’ed and or’ed. Some argue that the tail of a complex preposition is actual a noun, but then no on adds the pi for it’s modifiers, like in 3). And in 4) it is impossible to not look at it as a complex preposition. I plan to parse preps as potentially complex as in 2) and 3).

1) 543 = 5 (hundred) 4 (tens) 3 (ones)

A decimal system is ten digits and some of the place names, for example, a place name every three places. The choice of words isn’t so important (words for commonly measured, e.g. tenpo or approximate values, suli, mute, lili are bad choices), I plan to implement numbers as parameterizable templates.

1) jan li toki e kalama Wakawaka.

2) jan li toki e Wakawaka.

3) mi kute e Tinton li pini lape.

4) soweli li Mijawu li Mijawu.

5) Mijawu!

Onomatopoeia is so rare that the particular answer should not matter much. Forcing these to be proper modifiers sort of works, except in #5, it pedantic to make such an exclamation a modifier and in 2) it probably redundant to mention again that this is a sound, human sound, or animal sound. (kalama, nimi, mu) These should be allowed to stand bare. But possibly punctuated to distinguish from accidental unheaded proper modifiers.

1) jan li kepeken ilo kepeken soweli.

2) jan li kepeken ilo en soweli.

toki pona has these things that look like chains linked by particles. Sometimes the pattern seems to fight against recursive patterns, sometimes it seems to give up and just do recursion. For example, x en y en z can be read as an unordered list of elements. This is a very simple data structure, it fits with the design goals of a simple languages. a pi b c en a2 pi b2 c2 is something a of a tree structure at least. There are chains joined by chains. The prepositional phrases are odd balls because if you string them together, the particle is … blank, e.g. jan li sama soweli kepeken ilo. The chain of prep phrases either can be analyzed as having a particle with six forms, or a blank particle and the prepositional phrases is headed, or possibly conjunctions are ordinary chains. People seem to use 2) above, i.e. prepositional phrases are not chains like en, or pi. Anyhow, 1) and 2) are semantically equal and there isn’t a compelling reason to make 2) illegal– for one it makes texts clunkier, implies that a string of prep phrases should group together, e.g. sama A sama B kepeken X kepeken Y. I sort of want to make sure you can treat prepositional phrases as chains (it’s sort of elegant to treat all the phrases as the same datastructure), but won’t write my parser to expect them to be treated as chains.

Posted in toki pona | 3 Comments

Object Oriented Conlanging

If you don’t write code, you may feel like the wrong audience.

So recently I’ve been on a kick of learning about object oriented library writing, especially for the most basic types. I notice that MSDN’s advice for writing basic types and the exercise of writing a library acts sort of as a model exploration exercise that leads to surprising discoveries.

Why a Conlang?
Sure natural language processing libraries exist for real languages like French and English? Yeah, they do, but French and English are so mind boggling complex that writing code to parse them is a task that has befuddled developers for decades. Even established conlangs are much smaller, even those that weren’t initially designed to be small or limited in their complexity.

Also, during the creation of a conlang, out of necessity, it is small, if only because the vast majority hasn’t been imagined yet.

One more reason: because you may overly restrict yourself by considering datastructures that are peculiar to English and French when one possible goal of a conlang is to create a human communication system that goes beyond what can be done in well known natural languages (this compares to the naturalistic-conlang goal of creating a reference grammar that fools professional linguists– a goal that will put some limits on breaking new ground into new techniques for communication strategies)

Basic Types in a Conlang
Words. The basic type should be a token, not a string. Strings are not atomic enough.
Morphology. The language I am using (toki pona) as my model doesn’t have any morphology to speak of.
Phrases. Here is where I discovered the most. It turned out to be helpful to have many specialized classes that can exploit information that goes beyond what a formal grammar might know. For example, some words are likely agents, and those agents can own things, they can be the agent of a transitive verb, and so on.
Forgotten Structures. We almost never formally deal with punctuation, quoted text, diglossia, numbers, dates, animal sounds and so on. When you work with real sample texts, you encounter these issues immediately and realize that they can’t be ignored and deserve to be treated in terms of importance as concepts on par with subject, object and prepositional phrase.
Alternative parsing. A machine parser will yield only one parse, automatically, for free because you don’t have any other choice. However, the formal grammar will suggest that for any given text many parsings are legal. I think that letting the machine pretend there is only one parsing is pragmatic– it forces the language creator to decide how much ambiguity is tolerable.

Basic Applications of a Conlang Library
Word Processing. If you can parse it, it is potentially valid syntactically. A prerequisite to parsing is dictionary lookup, so if you can parse, you can also support spell-check.

Lorem Ipsum Generation. If you can turn text into a data structure, you can turn data structures into text and you can do it randomly. This will illustrate which sort of structures are possible, both validating the language design and the

Knowledge Representation. I wrote about this recently, but in short, if you can generate random sentences, you can take a sentence template and generate complete sentences using a data table, say a phone book. And with a parser, you can turn natural language questions into queries and commands to get data back from a phone book.

Concordance And concordance, although I would almost suggest that existing concordance tools are good enough and don’t need to be language specific.

Posted in machine assisted conlanging | Leave a comment

Corpus linguistics and little sins

Soooo, this toki pona project of mine. I’m parsing community generated texts. I’ve got incompatible goal. On one hand, I want a parser that will work. So if you type English, it should blow up. But if you type toki pona, it should parse it if it is beautiful, conservative toki pona. But if it is sloppy toki pona, I don’t really want to make a big deal of it. So you forgot to capitalize, forgot a period, forgot a closing quote, forgot the li, added a period instead of a comma before a li, and on. I’m not going to add new rules to try to deal with these. So at the moment, I normalize them. I just fix them.

Then there is this:

meli li tawa en tan lon palisa.

Ignoring what it might mean, it’s a compound propositional phrase, just like English, “No smoking in or around the school.” It could have been written:

meli li tawa lon palisa li tan lon palisa kin.

But that would just sound pedantic. Some stuff is sort of borderline.

How to you deal with noises?

jan li owi. => The guy said ouch!

I could fix it to:

jan li mu owi. => They guy made noise like ouch.

But that looks pedantic and I don’t like the word mu all that much. It sounds too much like a cow and you have to add another word to indicate the actual sound.

How do you deal with defective names?

nimi mi li nimi ‘jan Laowi’

Fixing them would make parsing easier, but would get in the way of communication, especially if a name is already well known.

Posted in toki pona | 2 Comments

Edge cases of grammar

These things will make your grammar more complicated, but you can expect them to show up in any community generated corpus, like immediately

Onomatopoeia. Fart noises and the like. In English, they get italicized and I’m sure someone has written a paper on what part of speech they are.

Direct speech. Though you didn’t have embedded sentences? Now you do.

Fragments. If a sentence is cut off, either on purpose or accident, grammatically what can you do with it? If understanding a sentence is a process of parsing by applying syntactical rules, what are the rules for dealing with fragments?

Diglossia. If you mix English and another language, what are the rules for swapping parts out?

Errors. This one stumps me. When computers parse, if one word is off, the computer completely fails. It can’t do anything with that program. But for human speech, if ten things are wrong, we apply a set of syntax like rules to fix it up and we don’t even notice. Sound absurd? This is essentially how modems worked with error correcting and check sums. If you’ve ever used an application called resharper, it does a similar thing for programming languages. It uses static analysis to find syntax mistakes and suggest corrections.

Punctuation. Think you don’t need punctuation? Think again– a simple grammar can yield dozens of alternative parsings. Punctuation brings that down to a manageable level. If it needs to be audible punctuation like lojban, that’s another story.

Compound words. (and neologisms) Think you only have 1000 words? Almost immediately, phrasal compound words will appear. Interestingly, if you treat them as compound words, the parser does better glosses and you have fewer alternative (wrong) parsings. Toki pona & Klingon have this issue– since both languages have a fixed number of bound & unbound morphemes (one by design, one by community choice). The alternative of just imagining all these common word pairings to be “ad hoc” phrases is really just dishonest.

I’m running into all of these issue when trying to machine parse a toki pona corpus– that language only has about 10 rules in the formal grammar. But my parser just keeps getting more and more lines of code to deal with issues like the above.

Posted in conlang design, conlang use | Leave a comment

Puctuating toki pona- Community Proposal

toki pona has a few constraints, without which the game of toki pona is rendered silly. Foremost is that there are only about 125 or so morphemes (fairly non-bound). This isn’t as much of a problem and I think tp community proposals can stick to this.

The next idea is that numbers, dates and so on are lacking, as if this were the language of an ancient tribe (despite missing a fully formed system of naming plants, animals and extended family relations). This is problematic for working with data on computers. Numbers and Dates are basic types, without them certain computer experiments are harder than necessary.

I’m writing a parser and I need to make a few modifications to make tp easily parsable. My parser does a two pass parse– 1st phase I normalize the text & make best efforts to add punctuation as described below. It is difficult and error prone. If I didn’t have to do this normalization, the parser would parse more text on the 1st try and get better glosses on the 1st try.

Summary
1) Phrasal compounds are joined with dashes. jan-pona. jan-pi-sijelo-pona.
2) Prefix numbers with #, e.g. #wan If it is a two word number, it is hyphenatied, e.g. #wan-tu
3) Direct quotes are in << >>. e.g. jan li toki e << toki! >> (everyone else appears to use English convention of single or double quotes– but I need an escape character, see below)
4) Prepositional phrase must start with , e.g. mi li, lon ma ni. jan li moku, kepeken ilo.
5) Non toki pona text is escaped with double quotes. mi toki kepeken toki “English”

We have compound words. We pretend we don’t, but we do. These are lexemes, phrasal compound words. Compound words are joined by -’s

jan-pona = friend.
jan-pi-sijelo-pona = doctor.

Why? Because you can’t accurately machine gloss jan pona to friend. Why should we pretend that jan-pona is anything but a phrasal compound and gloss it as good person, healthy person, friend, etc. Without hyphens, I have to gloss using a list of alternatives. With hyphens, I can dispense with a list of alternatives and home in on a single gloss.

Unhyphenated.
jan li ike li tawa jan pi sijelo pona li kama jan pona.

Hyphenated.
jan li ike li tawa jan-pi-sijelo-pona li kama jan pona.

We have “rovers”/syntactical infix. I don’t know what these are really called.

jan-mute-pi-sijelo-pona = doctors.
jan-pi-sijelo-pona-mute = doctors.

We need numbers. The shall be words prefixed by #
#ala
#wan
#tu

#luka

I will have to look up 3,4,6,7,8,9 from the forum. I know there are many proposals, I’ll look for community ones and then I plan to implement the ones that are base 10, don’t introduce new words, positional and reasonably efficient, e.g. no worse than English in expressing large numbers.

Some numbers are legacy numbers with some degree of officialness and will have to be supported.

#wan-tu-tu = 4
#luka-luka = 10
#MMLW = 20+20+5+1

But I don’t recommend using legacy numbers if you are trying to communicate.

Watch this space!

We need direct quotes. They shall be wrapped in << >> (or the « » if you can find those keys on the keyboard)

jan li toki e << mi jo e soweli! >>
He said, “I have a dog.”

I hope I don’t regret this choice because < and > mean something in HTML and might cause problems in some content management systems. Oh well.

Anything in direct quotes markers is syntactically a content word.

We need commas.
People currently add commas before or after la, but actually we don’t need them there. I have no opinion about what people do there. Also I have no opinion about commas in pi-phrases.

mi pali, kepeken ilo sona, lon tomo, pali tawa mani.
I work with computers in the office for money.

When there is nothing to distinguish a preposition from a content word, it is valid to parse every word after pali as a string of adverbs:

mi pali(kepeken ilo sona lon tomo pali tawa mani).

Humans can realize that is unlikely, but a machine can’t. Humans can parse invalid toki pona and realize that someone is mixing Russian and English and toki pona rules and, with some effort, realize the intended correct toki pona. This sort of parsing is a huge effort to implement. On the other hand, commas make parsing mechanically effortless.

We need an escape character
The corpus texts are full of mixed language material, from accidents in transliteration to people just trying to communicate. After transliterating to toki pona, normally the original is unrecognizable– it might as well be a completely new word. So toki pona texts that interact with the real world, will need to have foreign text. And that text should be in double quotes.

nimi mi li “Matthew Martin” li jan Mato.

Anything in double quotes syntactically is a content word.

Dates
The current date system is something like

tenpo suno wan, mun wan, sike suno wan = 1/1/1

You can find some variant of this on the wikia for toki pona. It uses legacy numbers and is to cumbersome for anyone to want to use it.

I’m going to recommend this format: y-m-d
S1-M1-T1
Because it will be easier to sort.

Also, for this to work, numbers have to be reasonably efficient and be able to cope with numbers from 1 to 2015.

Watch this space!

Posted in machine assisted conlanging, toki pona | 2 Comments

Robot Languages

By the way, it looks like Dothraki has a published spec. Now on to other topics.

As someone with the facial expressions of a robot, I’ve always been partial to robots and some of my earliest attempts at programming were to create chat bots and AI. I failed, of course. But now I have some ideas on how to make it work.

Our human brains have some sort of knowledge representation system, it turns our stage– the world around us, into facts represented by neurons linked by axons and dendrites, which chatter using neurochemicals. We lack a technology to accurately and usefully use a neurological model to represent reality. But, hey, we got other ways to represent reality. For example, we use documents and relational databases to keep track of inventory and the business activities of all large businesses and government in the world.

Normally, when this need to be communicated, we use protocols like HTTP to send (often technology independent) serializations of database records that can be sent across a wire. We then using UI’s and binding to turn this into human consumable materials.

But lets get back to robots. Robots are machines that would want to be like people, and thus use a natural language. That means they could possibly deal with people directly. But English is hard, so maybe a conlang or restricted version of English would be better.

Representations of reality:
Name – Phone Number
Joe – 555-1234
Jane – 444 – 5678

If this was toki pona, we could serialize this as:
nanpa pi jan Joe li 555 – 1234.

By some complicated system of equalities, we could work out that this is the same as:
jan Joe li jo e nanpa ni: 555 -1234

If the robot heard a sentence, it would attempt to use deserialization & equality checks to transform the utterance into a known data type:

jan Mato li jo e nanpa ni: 111 -8989 ==> Mato – 111 -8989

A lojban style processor could also answer utility questions like,

nanpa Jane li 444-4678 la ona li toki tawa mi.
If Jane’s number is 444-4678, then she’s talking to me.

And the robot would respond, after binding & processing pronouns:
jan Jane li toki tawa sina.
Indeed, Jane is talking to you.

Or utility questions might involve common computer tricks like, “how many digits are in Jane’s phone number? What is the sum of the digits in Jane’s phone number?” A human actually excels at this arbitrary discussion, where as a robot has to be programmed for each exchange of that sort.

Pronouns seem like something that would be really, really hard for a computer. If my computer only had a knowledge representation system for the phone book, it would need to know who is a person, who is capable of having a phone number and so on. People excel at common sense, modern code doesn’t. Databases rely on nonce, unique names and variables that might be bound to anything are used only in limited scopes to make sure that they do only bind to 1 thing at a time.

Next, is the chat bot problem.

Chat bots respond to what ever you ask. Usually it’s modeled as a command. But human languages only sometimes use commands.

If Jane’s number is X, then she’s talking to me. (Implied, asking for confirmation)
I know Jane. (Implied, asking for additional information about Jane, e.g. Oh, you do? I know her too, her number is X)

Another thing a chat bot should be able to do, is serialize things into something that is suitable for saying over the phone. Most code dumps text to the screen, often in a grid format. A good robot would be able to tell a story in a way that takes into account attention span. A bad robot would read all 5000 phone numbers. A smart robot would say, after reading two, “and so on” or “do you want me to keep going or are you looking for someone in particular”

State– some of the best chat bots are sadly stateless. They don’t incorporate anything you say into their base of knowledge. Some do, but it’s kind of wonky– they just remember that after saying “Good day” people usually just repeat “Good day”.

A good robot takes all utterances and converts them into a system of knowledge.

My phone book robot, if I said:

mi jo e soweli.

Would interpret that as asking the database to create a new table like so:

who – inventory
jan Mato – soweli

And if two minutes later I asked:

mi jo e seme?

The robot should be able to look it up even though 10 minutes ago, this robot only knew phone numbers.

This is the flip side of serialization– turning language back into the knowledge representations system.

Anyhow, this has been done before, MS SQL had a natural English processor, it was probably similar to what I have described, although I bet it only dealt with turning english into SELECT statements and turning the tables of data, maybe into English sentences. Turning English into tables that can be queried again is probably hard.

A tp fact database would rely heavily on equality tests:

mi jo e soweli lon tomo mi.
Does this factually contain the following?
mi jo e soweli. Yes.

Anyhow, hopefully personal life will allow the free time to write such a thing. So to recap:

Knowledge representation system: E.g. relational tables.
Serialization system: E.g. turns rows and tables into sentences
Deserialization system: Creates tables and binds utterances to a table, then inserts 1 or more rows.
Persistence: All commands, factual or otherwise, become part of the system of knowledge.
Query language: Questions, or statements that prompt retrieving information and serializing it back to the interlocutor.
Utility: Processing tasks that are not really related to retrieving and updating a representation of knowledge. For example, answering if at least 3 people in the phone book have names starting with “G”
Equality and Transformations. Natural languages can serialize into many equivalent forms.

Posted in machine assisted conlanging | 1 Comment

The 5 Word Langage

Finally, I sort of have the time to do a quick review of the 5 word language. I’m a big fan of small languages (as in small vocab, small numbers of rules, small number of morphemes)

This is really, 5 morphemes. I think all small languages, in practice have about 2000 lexemes (set phrases that behave as words that you just have to memorize). But no need to quibble about the number of morphemes, this looks like it really is 5 morphemes. That is small.

What is totally awesome about this, is it is 2 morphemes short of what you can memorize in a single view–short term memory is about 7 (plus or minus 2) items.

The vocab is laid out in a grid and the entire grid is used. This compares to toki pona which ignores the diagonal– i.e reduplication in tp doesn’t mean anything.

I think logotome is a real word… shoot me if it isn’t… and the logotome of a language is the set of all possible words (or lexemes) that the phonotactic system lets you create. Toki pona’s logotome is huge– even with the small alphabet and CV(CV)(N) structure, you have like 10000s of possible short words. A five word language has 25 two word phrases,

Good & Potential Applications
This might be a useful conlang creation technique. Create a dozen small 5 word conlangs, then turn the best of those into a large language. If the large language was compatible with the small one, then you’d have a conlang with a core that someone could learn before they lose interest, which I suspect is about 2 maybe 3 hours. Irrr

I like the idea of using numbers– it has an obvious application for text messaging on phones (there is a (failed) app for that! it’s an emoji type conlang). If I were to learn/create a dozen of these language though, I’d worry about having to remember that 42 means one thing in the 5 word language in another thing in the 7 word language. Already I constantly mix up telo, which in toki pona means water and in Russian means body. I read somewhere that when you hear a word that is the same in 3 languages (like, say, chocolate or tea) you the area of your brain in charge of that word for each language lights up. So cross language interference may be a real thing to consider when making small languages that draw on the same phonotactics (or logotactics– I don’t see if the 5 word language has a spoken format).

Domain specific languages. A domain specific language is an idea borrowed from software development– where you create a mini-language to deal with a specific topic. Then the language can be optimized for talking about that specific topic. It’s like an extreme version of slang and technical jargon, which can feel like a mini language. Except English jargon still will follow English grammar and syntax. In the lexicon, words are available for diabetes and God, so I figure this language makes it easier to talk about those topics. And if you want to talk about something else that doesn’t suit the language’s lexicon, create another 5 word language!

Areas for Improvement
Like many combinatorial languages (i.e. a fixed set of morphemes that are combined in all possible combinations), this particular description doesn’t say much about grammar. Is the grammar isolating? Do we have bound morphology? (i.e. do any of the morphemes only occur in a fix relationship to other morphemes, ie. do we have a tense suffix?) What are the basic sentence patterns? I.e. is it S-V-O, OVS, do we have prepositions, post positions, do we branch left (like japanese) right (like english) (Did I get that backwards?) or mixed branching (like toki pona). Do we have Part of Speech rules or are all words content words? Are some words “semantically bleached” and don’t mean much on their own, but mean something when in a sentence… examples from English– the, to, in, of, have, going (future), etc.

Posted in conlang | 1 Comment

Undeveloped Public Domain Conlangs

Barsoomian is public domain, unless it is the most recent movie version.

But here is one that I just notice: Parrot from Doctor Dolittle:

“Ka-ka-oi-ee, fee-fee”
“Is the porridge hot yet?”

Oh, boy. How shall we do an interlinear gloss for that?

I’m going to guess ka-ka-oi-ee is a compound word meaning porridge and fee-fee means ‘now’ and reduplication means a tag question. Since reduplication indicates something you don’t know, then ka-ka would be “mystery” as in mystery-meat. So “mystery-food, now-now?”

And the dog speaks a constructed sign language.

And I’m going to guess that after page two the author lost interested in actually describing the fake languages. A pity.

But hey! It’s public domain. You are allowed to fully develop the language and sell it. Go forth and do so!

Posted in Uncategorized | Comments Off

Esoteric Buddhism and Conlangs

I’m still reading about Esoteric Buddhism, so I’m no expert and may mis-speak. However, while reading about esoteric Buddhism, I got a bunch of fake linguistics ideas.

Esoteric vs Exoteric Readings
The exoteric reading is the way you’d read a text and simply understand it. No secret messages. But if each thing and action is actually a symbol of something else, then you get an esoteric, secret reading. It’s post hoc magical thinking to think that there really was an encoded message, but people didn’t dismiss the idea of esoteric readings, so was there any benefit to the action? I think so, I think it works as a creativity device. Take a text, imagine that there was a secret interpretation and find it.

I think some language and syntaxes are going to be more given to an esoteric reading. For example, in toki pona, most sentences follow the same form. So, as a contrived example, “Let’s go shopping” and “We attack at dawn” would still follow a subject-verb-DO-prep phrases pattern.

Polysemy and homonymy helps create intentional esoteric readings. So if a language really did define “Let’s=We” and “attack=go shopping” and “missing time marker = dawn” we’d have an intentional esoteric reading. Without such a huge amount of homonymy, esoteric readings would be accidental or occasional at best. But post-hoc estoteric readings would only be restricted to your imagination.

So an esoteric reading is an act of communication system construction, a new mapping of meaning onto an existing syntactic structure.

Mantras
Mantras are magic formulas. Some are perfectly intelligible, usually invocations of powerful beings. Namo Amida Butsu! Hail the Amiddha Bodhisattva!

Some are nearly unintelligible, but when written in Chinese, the radicals do mean something. It’s like canard about the word “crisis” have two symbols that each on their own mean something that has a clever relationship to the compound word. In a dhirani (mantra w/o meaning), the Chinese is your substrate for an esoteric reading.

Again, this is an act of communication system creation, sort of like if you discovered an ancient text and by fiat decided it was a recipe for bread by matching up words with an English recipe. This of course would fall apart if you applied the trick to an additional undeciphered text.

Anyhow, what can you do with this? Maybe it’s a good practice for language creation, which often stalls at picking a phonetic inventory. Take a non-sense phrase, and do an esoteric reading. Your esoteric reading is sort of mini-conlang.

Posted in conlang design | Comments Off

Language learning materials

So I’m working on improving my Russian. These things drive me crazy about learning materials:

The pronoun pri means “during” (but also most of all other prepositions). And that counts and a definition. It is then followed by two, maybe three examples. And that is all. This works fine if you already know Russian and are looking to label and identify the rules you already know. Prepositions and cases from a learners standpoint are all chaos and unpredictability. I’d rather a length set of examples than some suspicious and long list of rules. Instrumental case– it’s the case to use for your profession (What the f*k?) It is also the case you use for certain “x and y” constructions. (What the f*k?) Dative case, it’s the case that you use in a sentence where you like things (What the f*k?) Just give me a length list of samples.

The genitive case is the case you answer with should someone ask “Kovo?” I asked my Mum who speaks not a word of Russian Kovo, over and over and she never gave me the genitive of any word. This rule works only if you are already fluent in Russian and need to be able to label this rule that you already know.

Single word repetitions. (You heard that word once, you’ve memorized it, right?) Should I fault books for not being flash card decks? I will. Why can’t they be creative and put words in, say a 8×8 grid, so that you can review them in pseduo-random order?

Single demonstrations. Okay, lets take an example outside of Russian– the Algonquian obviate. It takes a few pages to explain the obviate and at the end of that explanation you will be utterly confused about how the deep structure works. Or the superficial structure for that matter. You then get two example sentences. As a learner, I think I will need maybe 100 or more example sentences to illustrate a rule that no author can explain very well. An example from English would be dangling or stranded prepositions. (“There are some things I will not put up with!”– As a fluent English speaker, it sounds right, I don’t imagine even six pages of technical explanation would help a learner– but four or five hundred samples might help)

Charts. Here is a 1000×1000 chart of all the two word phrases in English. Memorize them. Each cell in a chart generally, for me, feels like a entirely separate fact/skill and their location in some grid is about as important as ordering them by number of letters, graphing them by which have the most straight lines vs curved lines, or other pretty but irrelevant details.

Posted in Learning Any Language | 4 Comments

Conlex- Here is what I hope it means

Conlex– a sort of activity for people creating new (or reviving dead/nearly dead) languages by actually speaking them into existence. It entails create a language, especially the materials necessary for learners and then doing what it takes to get a few other people to learn the language– maybe it means teaching it to the toddler, the girlfriend or promoting it for its attractive feature (who knows, maybe it’s pretty, maybe it’s because it’s useful for silent communication in the dark). It entails creating new culture only to the extent that one hopes and expects learners to adopt those new customs. This is conlanging where the conlanger is not a god-king to be worshiped (a la Zamenhoff), but a peasant and the learners are kings.

Despite the internet being a wide and vast place, for some topics, it seems rather small. In the area of non-natural languages, it is small. Relatively few participants and each non-natural language tends to carve off a new community as it is hard to gain competency in more than a few languages.

So let’s say you are reviving Beothuk. Let’s imagine that you did an exercise I promote, which is to write up your own manifesto about what a new language should be. (Yes, go ahead and send me nasty grams about how I am forcing, yes forcing you, to write your own manifesto instead of slavishly copying mine, or esperantos or lojbans or Tolkiens. The people that make you face your freedom are the worst.) At the end of the exercise you say, hmm, this language should be alive. People should actually use it. Maybe even three people. (And mentally, you can re-do my hypothetical with a diary-con-language to be taught to toddlers in the home, or a conlang that relies on touch boards for the profoundly disabled, etc)

(Yes, your manifesto might be different that my hypothetical, for example, if your manifesto is cribbed from Tolkien, well, it isn’t a conlex and that is a topic for someone else to write)

So you go to Conlang-L and watch people talk about the joys of writing tolkien style reference grammars. They have to be hard! Because that is the only way that they will get any respect I guess. But in my hypothetical, you want the language to be human usable, and you end up with a lot of not very helpful and sometimes angry advice about how the language needs to be hard, huge, copyrighted, hostile to learners and not to be sullied by actual use. But the 2 people that are likely to be interested in Beothuk have no interest in fictional sound-change-histories, they really just want an expressive, learner oriented language with lots of teaching materials.

And you visit the auxlang lists, who are creating yet another Esperanto. Which is a fine project, people should learn from it. But the manifesto in this hypothetical calls for a language based on the remnant words of the a lost people, so averaging the vocab of European languages and marking everything for part of speech is out! And 3 people is plenty, neo-Beothuk isn’t going to rule the world. So we keep moving.

And you visit more websites where people hawk languages they want to include a conculture that requires inflecting the pronouns and verbs for the social rank of your three parents, each of a different gender. Oh, joy, that is going to be fun to speak here on Earth. And again, you get not so helpful advice about how your language MUST have a conculture because language and culture are inseparable and any language you write must be dripping with culture or you’re doing it wrong and they will eventually get mad at you. Neo-Beothuk might include culture-light (say a new form of honorifics or Beothuk Day), but if we don’t want to tell those 3 Beothuk fans to bugger off, we’ll have to be constrained by what they want. If they don’t want body paint, asking for it will get in the way of launching a new language.

Anyhow, there is not a corner of the internet that serves this sort of thing at the moment, not LCS, not Conlang-L, heavens to betsy not Zompist. The Klingon, na’vi, and toki pona communities all seemed to burst into existence despite a place where people create languages for learner and indiscriminate language learners can meet.

I hope the equivalent of the “learn any language forum” for non-natural languages comes into existence and that a new learner and “fan-centric” approach comes with it.

Posted in conlang community building, conlang learning | 2 Comments

Fake Mantras and Fake Languages

In Hinduism, there was the idea that words said in a prestige language were magic. People at has some pre-Sassurian ideas about sound and meaning, namely that there was something doggy about the sounds d-o-g and something catty about c-a-t. Since then, this has been proven nonsense, or at best, words of similar meaning can group together in how they sound. I don’t have the examples handy, sorry. And of course, there is real morphology, where antidisestablishmentarian has a bunch of parts that mean something, but on the other hand, carpet doesn’t have parts, but it looks like it.

Back to India. They imagined there were seed syllables, the syllables mean something (as if car in carpet really meant car!) The sounds were typically vowels and liquids, less typically any sort of consonant that completely blocks the passage of air. “kit cat” would be a lousy mantra. Try to chant it… it doesn’t roll of the tongue. But “lily” does. By this reasoning, you can have obstruent consonants at the beginning of a mantra, but not the middle– nothing that blocks the breath.

So fast forward to now. Meditation is popular. We often do secular meditation using numbers, 1, 2, 3, 4, 5,… 10, repeat. If you go over 10, you know your mind has wandered. Or in traditional chanting, aka noisy meditation, we chant something, usually 2000 year old Sanskrit phrases that are untranslatable nonsense, agrammatical strings of themes (bija syllables with some sort of symbolic meaning) or possibly bad Sanskrit made up by someone who didn’t actually read or write it (Mantra of Light, I’m looking at you) Some of them are names of god-like Bodhisatvas. I find it endlessly distracting that I’m chanting the name of an imaginary superman.

Another thing that happens with mantras is massive streamlining. Namu Amida Butsu turns into Nembutsu because people are trying to say it 10 times in one breath. Or to say it 100s of time and they want to finish on time to go to work. Which brings up another point, matras act a sort of linguistic clock. If you want to meditate for 20 minutes but don’t have a clock, you can chant x times and on average hit 20 minutes. (sort of like, 1 Mississippi, 2 Mississippi, etc)

English Mantras
Nonsense – “Ya ba da ba do da!” “Hi ho, hi ho, it’s off to work I go!”
Traditional Translations- “Homage to the Amida Bodhisatva” (bleh, unchantable)
Modern innovations- “love and peace…love and peace” (or “love and peace and brownies…”)

The down side of a mantra you understand is that you might get distract by the content of the mantra.

Toki Pona Mantras
Assigning meanings to any percent of the possible syllables would create the possible problem of creating words. So if pon means good, and lon means the universe, no new meanings. If tila means “compassion”, oops, we’ve coined a new word, albeit one only for mantras. Grammar also posses a challenge. In toki pona, all utterances are supposed to be grammatical, else you aren’t doing toki pona. But a Hindu style mantra, might be something like:

pon(a) lon pilin pon(a)…etc.

And that isn’t grammatical. So it’s a community innovation, which may or may not bother you.

Grammatical toki pona mantras would be something like

o jan Puta Amita o tawa e mi tawa ma pona sina!

Anyhow, toki pona mantras will sound better if you drop the final vowels and or n, it will add more vibrations. This actually is a legit toki pona maneuver. Toki pona phonetics were designed to make it easy for anyone to say it, so transformations of the language are legal. For example, you could still express toki pona with all the l’s pronounced as r’s, all the k’s pronounced as g’s etc.

Other Conlang Mantras
One idea is to use articulation symbolism– assign symbolic meaning to each part of the tongue and mouth & construct magic words that have a nice mixture of symbols.

Post written from feedback on facebook conlang group, facebook toki pona group, and toki pona forum.

Posted in conlang design, toki pona | Comments Off