Spelling systems and codes

[Update: See also my 2013 post: Kán yu andastánd wot aim seiing?]

In a comment on Stan Carey’s blog, I mentioned a regular spelling system that I once invented for my dialect of English.

This was never intended as a cure for the irregularities of English spelling. I’ve always been happier inventing things that are pointless but fun over anything of practical use, so I made it in many ways the opposite of what you’d want from a workable spelling reform. It’s very specific to my own dialect, contains complicated and sometimes ambiguous rules, and has the look of a completely alien language. These qualities give it a certain aesthetic appeal, but certainly not a practical one.

As I mentioned in my comment, I’ve lost my description of the system, but managed to find some sample text from which I can reverse engineer most of the rules. As far as consonants are concerned, there are only a few surprises. Among them:

  • c‘ for ‘sh’, ‘tc‘ for ‘ch’; ‘j‘ and ‘dj‘ for the voiced counterparts.
  • hs‘ for unvoiced ‘th’, ‘hz‘ for voiced ‘th’.
  • yn‘ for ‘ng’, or simply ‘ng‘/’nk‘ when the g/k is pronounced.
  • Syllabic consonants treated as though preceded by schwa.

Vowels are more complicated, and this is where the specificality with my own Australian dialect comes in. Either alone or with certain diacritics, the vowels ‘a’ and ‘e’ represent front vowels (vowels pronounced in the front of the mouth), ‘u’ and ‘i’ represent mid vowels, and ‘o’ represents back vowels. The vowel  ‘o’ is also used as a length modifier. More specifically:

  • a‘ represents the vowel in ‘had’, ‘á‘ the vowel in ‘head’, ‘e‘ the vowel in ‘hid’ and ‘eo‘ the vowel in ‘heed’.
  • u‘ represents the vowel in ‘hud’, ‘uo‘ the vowel in ‘hard’, ‘úo‘ the vowel in ‘hurt’, and ‘ûo‘ the vowel in ‘hoot’.
  • i‘ represents schwa.
  • o‘ represents the vowel in ‘hot’, ‘oo‘ the vowel in ‘hoard’, ‘ó‘ the vowel in ‘hood’.

Diacritics are dropped within diphthongs. For example, the word ‘oh’ might be regarded as ‘i’ followed by ‘û’, but we drop the diacritic and spell it ‘iu‘. Likewise ‘air’ is not spelt ‘ái’, but ‘ai‘. The diphthong illustrated by the word ‘ow’ cannot be spelt ‘ao’, as this is reserved for the vowel in ‘hard’, so it’s spelt ‘aw‘ instead.

The length marker ‘o‘ is dropped at the end of a word and within vowel clusters. A vowel cluster is where there are two adjacent but distinct vowels or diphthongs, as in ‘chaos’ or ‘piano’. For example, ‘piano’ is not spelt ‘peoaonniu’, but rather ‘peanniu‘ (the double ‘n’ marks the stressed syllable; we’ll get to that in a moment).

The first syllable in which the vowel is not schwa is assumed to be stressed. In other cases, stressed syllables are marked, usually by doubling the following consonant, or the first component of the following consonantal digraph (e.g. ‘hs’ becomes ‘hhs’). If stress is at the end of the word and there is no following consonant, a dummy ‘h‘ is added. I also had a rule worked out for marking stress if it occurred on the first half of a vowel cluster, but I don’t think I ever found such a word, other than short ones like ‘chaos’ (‘kaeos‘) that are covered by the default case of stress on the first sans-schwa syllable. So the rule was rather academic.

On top of all that, and some more detail, I had rulings on various difficult cases. As an example text, let me use some excerpts from my comments on Stan’s post. I’ll probably make some mistakes.


I invented a regular spelling system for English once. It was completely unworkable, because I was doing it for fun and therefore workability was not a priority. I wanted it to have a completely alien feel. It was very specific to my own dialect and involved a convoluted scheme for marking the stressed syllables of a word.


Ue envánntid i rágyûli spáleyn sestim fo Englec wunts. Et woz kompleottle unwúokibil, bekozz ue woz dûeyn et fo fun and hzaifo wúokibellete woz not i praeorrite. Ue wontid et tû hav i kompleottle aelein feol. Et woz váre spisefek tû mae iun daeilákt and envolvd i konvilûotid skeom fo muokeyn hzi strást selibilz ov i wúod.

Is that sufficiently alien-looking, do you think?

The comment thread on Stan’s blog also touched on the subject of codes, and I’d like to share one substitution code of my own. This grew out of a Usenet discussion years ago, about whether one could devise a code based on the same principle as Rot-13 (i.e. a mapping from each letter a-z to another letter a-z such that each letter maps to a letter that maps back to the original letter) but with the mappings selected such that the encoding text is as pronounceable, and as plausible as a language, as possible.

There is a lot of hedge room in that phrase “as pronounceable as possible”, which is not the same as “pronounceable”. For example, one can accept a few unpronounceable digraphs, because these could plausibly stand for something in a hypothetical language. Think of something slightly less pronounceable than Polish text looks to the average English monoglot and you’ll understand the standard I was aiming for.

The best mapping I came up with at the time is as follows:

a=u b=q c=x d=g e=o f=p g=d h=j i=y j=h k=t l=n m=v n=l o=e p=f q=b r=w s=z t=k u=a v=m w=r x=c y=i z=s

I did not name this, but in writing this blog post I’ve been thinking about what to call it. It’s vocalisable code in the same family as Rot-13, so ROTVOC would probably be a good name, and ROTVOC translated into itself comes out as WEKMEX. What do you think? To illustrate it, let me use another excerpt from my comments on Stan’s post.


Stan, I should be able to reconstruct most of it from the sample text that I have, combined with what I remember. I don’t think it’s worth a blog post all to itself, but seeing as you mention codes, I could perhaps write a post encompassing the spelling scheme plus one or two codes of my own. Definitely a possibility for the week to come.

Translated into Rot-13 (for comparison):

Fgna, V fubhyq or noyr gb erpbafgehpg zbfg bs vg sebz gur fnzcyr grkg gung V unir, pbzovarq jvgu jung V erzrzore. V qba’g guvax vg’f jbegu n oybt cbfg nyy gb vgfrys, ohg frrvat nf lbh zragvba pbqrf, V pbhyq creuncf jevgr n cbfg rapbzcnffvat gur fcryyvat fpurzr cyhf bar be gjb pbqrf bs zl bja. Qrsvavgryl n cbffvovyvgl sbe gur jrrx gb pbzr.

Translated into WEKMEX:

Zkul, Y zjeang qo uqno ke woxelzkwaxk vezk ep yk pwev kjo zuvfno kock kjuk Y jumo, xevqylog rykj rjuk Y wovovqow. Y gel’k kjylt yk’z rewkj u qned fezk unn ke ykzonp, qak zooyld uz iea volkyel xegoz, Y xeang fowjufz rwyko u fezk olxevfuzzyld kjo zfonnyld zxjovo fnaz elo ew kre xegoz ep vi erl. Gopylykoni u fezzyqynyki pew kjo root ke xevo.

It’s not readily pronounceable (though some words come out quite beautifully), but if you think you can improve upon it within the given requirements then I would like to see what you can do. I doubt there’s any demand for an online WEKMEX encoder, but let me know if you’d like to see one.


7 Responses to “Spelling systems and codes”

  1. Stan Says:

    Fun and fascinating post. Does your spelling system have a name? It’s quite alien looking, but there are giveaways, like bekozz and wontid. Capitalisation and punctuation also give it a more familiar appearance than it would have without them.

    WEKMEX is a good name for the substitution code.

  2. Flesh-eating Dragon Says:

    I think that on the original document, I just used some descriptive phrase as the title, translated into the system of course. There might be recognisable words, but indecipherability wasn’t my goal so much as that aesthetically pleasing sense of strangeness that I often get from foreign language text.

    I found the sample text on a CD containing a backup copy of my website from early 2003. Why the explanatory document is missing I cannot explain.

  3. Ky Says:

    I have devised I regular phonetic-based spelling system but the only thing i haven’t done yetis how to mark stress. I thought I should an apostrophe like IPA but that could contradict a bit… I am australian too so I am thinking of sending my PBSS to the government and see if they consider it.

  4. Flesh-eating Dragon Says:

    Really hope the bit about sending it to the government is a joke… :-)

    The most important thing is that you had fun devising it. As long as you did, good on you. Next, you could try inventing a language if you haven’t already.

  5. Jonathan D Says:

    They’re all substitution ciphers, but while your code and ROT-13 are in the same family in that they’re both involutions, “ROT” is the family of rotations, so it seems a be odd to use it here. Perhaps “YLMEX” would be better?

  6. Flesh-eating Dragon Says:

    It’s a valid point, although not one that bothers me personally.

    You make me acutely aware that I don’t know the proper linguistic jargon for what I want to say. Basically, there are different ways in which new words can be made from bits of old ones: they can be made morphologically (preserving the relationship between each part and its meaning), but they can also be made … I dunno … insert adverb here … (symbolising a more abstract association with the entire original word). You prefer the former, which is a lot more conservative, but the latter is, in its place, a valid type of word formation as well.

    Of course, I’m not going to stop you from calling it whatever you like. :-)

  7. Jonathan D Says:

    You’re probably right about preferring one to the other, but in general I think the second one only works for people who think of the original word as a whole, either through unfamiliarity with the etymology or a certain sort of familiarity with the word. For me, Rot-3, Rot-13, Rot-47, etc. still read as a clear shorthand.

You are welcome to add your thoughts.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s