Continuing the conversation regarding space between words in various scripts that was narrated in this post:
https://hunterfiftyfour.blogspot.com/2026/05/what-if-we-do-away-with-spaces-between.html
I further wondered about Chinese script. I'm told that Chinese have characters as a whole and not letters at the atomic level. So, I asked Claude about it. Here is the conversation:
Me:
What about Chinese Mandarin? I believe students learn the 'word' as opposed to 'letters'?
Claude:
Yes, and Chinese is a fascinating case that actually throws a completely different light on our whole discussion!
Chinese Writing — A Fundamentally Different Philosophy
Chinese is not an alphabetic system at all. It is a logographic system — each character represents a morpheme (a unit of meaning), not a sound or a letter. So the question of spaces between words almost doesn't arise in the same way, because the unit of writing is already the unit of meaning.
This means Chinese sidestepped the entire problem you and the Irish monks were wrestling with — because they never went down the alphabetic road in the first place.
What Students Actually Learn
You're broadly right, but it's nuanced:
Characters first, not letters
A Chinese child learns individual characters — each with its own meaning, sound, and written form — simultaneously. There's no "sounding out" the way an English child does. You either know the character or you don't.
The scale is daunting
- Basic literacy requires around 2,000–3,000 characters
- A well-educated person knows 6,000–8,000
- The largest Chinese dictionaries contain over 50,000
- Compare this to English — 26 letters, and you can at least attempt any word
Radicals — the closest thing to an alphabet
Characters are built from components called radicals — there are about 214 standard ones. They give hints about meaning or pronunciation. So there is *some* internal structure, but it's more like learning meaning-chunks than letters. A student learns to recognise radicals the way you might recognise that the prefix "bio-" means life — it's semantic, not phonetic.
The Space Question in Chinese
Here's where it gets directly relevant to your earlier point:
Modern written Chinese has no spaces between characters — and it works perfectly well. Each character is visually self-contained and equal in size, so the eye naturally segments them. The characters themselves are the boundaries.
So Chinese essentially solved the space problem by making every unit of writing visually atomic and self-delimiting — which is arguably more elegant than either spaces or CamelCase.
However, when Chinese is romanised — written in the Latin alphabet using the Pinyin system — spaces are used, because without the visual distinctiveness of characters, romanised Chinese becomes ambiguous continuous text. Which rather proves your point that spaces are a crutch specific to alphabetic systems.
Mandarin vs Written Chinese — An Important Distinction
This is worth noting because it surprises many people:
- Mandarin is a spoken language — the most widely spoken dialect/variety
- Written Chinese is largely consistent across dialects — a Cantonese speaker and a Mandarin speaker may not understand each other *speaking*, but can read the same newspaper
- The characters carry meaning independently of how they are pronounced in any given dialect
This makes written Chinese almost a meaning-transmission system that floats above spoken language — something no alphabetic system can claim. It's philosophically quite alien to how Europeans think about writing.
The Memory Question
You might wonder — isn't memorising thousands of characters a huge burden compared to 26 letters?
Yes and no. Research suggests that:
- Chinese children take longer to reach basic literacy than alphabetic-language children
- But once fluent, they read at comparable or faster speeds
- The meaning-direct nature of characters means less inferential work — you see the character, you get the meaning, without the intermediate phonetic decoding step
It's a higher upfront cost for a smoother runtime — not unlike learning sandhi in Sanskrit, actually.
Bringing It Back to Your Idea
What Chinese shows us is that there are at least three viable solutions to the word-boundary problem:
System V/s Word Boundary Method:
Modern English: Space
Your proposed system/ CamelCase: Capital letter
Classical Sanskrit: None - context and training
Written Chinese: Visually self-contained characters — no boundary needed
Chinese is the most radical solution — it didn't patch the alphabetic system, it bypassed the problem entirely by using a completely different architecture of writing.
The irony is that Chinese, the world's oldest continuously used writing system, and Java, one of the most modern, both independently arrived at no spaces as their answer — just by completely different routes!
Me:
'The characters carry meaning independently of how they are pronounced in any given dialect.' What does this mean? Are characters homonyms or homophones across dialects?
Claude:
The excerpt you've highlighted is actually pointing at something even more interesting than homonyms or homophones. Let me untangle it.
What the Statement Actually Means
The claim is that a Chinese character carries its meaning regardless of how you pronounce it in your dialect. It is neither a homonym nor a homophone situation — it's something more fundamental.
Think of it this way. Look at this symbol:
3
An English speaker says three. A French speaker says trois. A Hindi speaker says teen. A German speaker says drei. But they all know what it means the moment they see it. The numeral is meaning-direct. It bypasses sound entirely.
Chinese characters work broadly like this — they are closer to semantic symbols than to phonetic transcriptions. The character carries the concept; the sound is a local dialect's way of vocalising that concept.
So Are They Homophones or Homonyms?
Not exactly — but here's how those concepts do appear in Chinese:
Homophones are rampant within a single dialect
Mandarin has a very limited set of syllables — around 400 basic ones, rising to ~1,200 with tones. But there are over 50,000 characters. So many characters share the same sound within Mandarin itself. The famous example is the Lion-Eating Poet — a classical Chinese poem of 92 characters that, read in modern Mandarin, is entirely composed of variations of the single syllable shi. It is incomprehensible spoken aloud, but perfectly readable as text.
This is actually a weakness of the system that emerged over time — as the spoken language simplified its sounds, more and more characters became homophones within a dialect, making purely oral communication increasingly context-dependent.
Across dialects, the same character is neither homophone nor homonym
A Cantonese speaker and a Mandarin speaker pronounce the character for mountain (山) completely differently — shān in Mandarin, saan in Cantonese. They are not homophones. But the meaning is identical and instantly shared. So the character transcends the homophone/homonym framework altogether — it operates at the level of meaning, not sound.
The Deeper Implication
This makes written Chinese something quite unique — a lingua franca of meaning across a linguistically fragmented population. China has hundreds of mutually unintelligible spoken dialects. But educated speakers of all of them can read the same text, discuss it, and agree on its meaning — even if they could not understand each other speaking.
It is as if English, French, German, and Spanish speakers all agreed to write in the same symbols, each reading them aloud in their own language. The text would be universally readable; the spoken versions would be mutually incomprehensible.
Written Chinese is, in this sense, less a written form of speech and more a standalone meaning-transmission system that speech merely taps into locally.
Which, when you think about it, is a staggeringly elegant solution to the problem of linguistic diversity — one that no alphabetic system has ever managed to replicate.
Me:
Here is a question from ordinary, everyday life. When I see a sentence like: A stone fell into water, I immediately transfer the written word like 'stone' into an image I have in mind of a stone, then similarly with the word 'water', and then the sound of stone falling into water. First, is my sequence correct? Second, does the Chinese characters eliminate the image forming altogether?
Claude:
First — Is Your Sequence Correct?
Your introspective account is remarkably close to what cognitive science and psycholinguistics suggest — but with some important nuances.
You are describing dual-coding, a theory proposed by Allan Paivio in the 1970s, which argues that the brain processes language through two parallel systems:
- A verbal system — for words, grammar, sequence
- A non-verbal system — for images, sounds, sensations, emotions
For concrete words like stone and water, both systems fire. For abstract words like justice or infinity, mostly only the verbal system activates — there is no ready image to summon.
But is it sequential, as you described?
Probably not strictly. Current neuroscience suggests it is more parallel and cascading than sequential. The moment you read stone, multiple things happen almost simultaneously:
- The visual word form is recognised
- Semantic memory activates — meaning, associations, category
- Perceptual simulation begins — weight, texture, grey colour, hardness
- Motor associations may even fire — how it feels to hold one
You experience this as a sequence because consciousness is a serial narrator — it tells you a story of one-thing-then-another. But underneath, it is closer to a simultaneous chorus that consciousness then narrates as a sequence.
What you described — stone image, water image, then sound of the splash — is your conscious mind's edited highlight reel of a much richer, messier, parallel process happening beneath awareness.
The sound is particularly interesting.
The fact that you spontaneously generated the sound of stone hitting water — without being prompted to — suggests your brain ran a mental simulation of the physical event. This is called embodied cognition — the idea that understanding language about physical events involves actually simulating those events in the brain, using the same neural systems that would process the real experience.
So "A stone fell into water" is not just parsed as a grammatical structure. Your brain ran the physics — and generated the expected sensory outcome. That is quite remarkable when you pause to consider it.
Second — Do Chinese Characters Eliminate Image Formation?
This is where it gets fascinating — and the answer is partially yes, and more directly than alphabetic scripts.
The iconic origins of characters
Many Chinese characters began as pictographs — direct visual representations of the thing they named. The character for mountain (山) looks like three peaks. The character for sun (日) was originally a circle with a dot — a drawn sun. The character for tree (木) looks like a trunk with branches above and roots below.
Over millennia these became stylised and abstract — but the visual-semantic link was baked in from the beginning. The character was always reaching toward the image, not toward the sound.
What brain imaging studies show
Neuroimaging research comparing Chinese and alphabetic readers found that Chinese readers activate a region called the middle frontal gyrus more strongly — an area associated with visual-spatial processing. Alphabetic readers activate it less, relying more on phonological processing regions.
This suggests Chinese readers are doing something closer to visual object recognition when reading — more like recognising a face or a diagram than decoding a sound sequence.
So for a Chinese reader seeing 石 (stone) — the path to meaning may be:
> Character recognised → meaning activated → image
Compared to an English reader seeing stone:
> Letters decoded → sound assembled → word recognised → meaning activated → image
The Chinese route potentially skips the phonological assembly step — going more directly from visual form to meaning, and from meaning to image.
But — and this is important
For highly educated Chinese readers, most characters are no longer consciously iconic. The character 石 doesn't look like a stone to a modern reader the way 山 suggests a mountain. The pictographic connection has faded with abstraction and stylisation over centuries.
So the image-formation step probably still happens — but the route to it is shorter and more visual than in alphabetic reading. Less detour through sound.
A Spectrum of Reading Routes
What emerges from both your questions together is something like a spectrum:
Script and Route to Meaning
English (alphabetic) : Visual → Phonological decoding → Word → Meaning → Image/Simulation
Chinese (logographic) : Visual → Character recognition → Meaning → Image/Simulation
Mathematical symbols: Visual → Meaning directly (no phonology needed)
Road signs / icons : Visual → Meaning directly (no language at all)
Chinese sits closer to the direct-meaning end than English — which is precisely what makes it the remarkable system we discussed earlier.
The Deepest Point
Your introspection revealed something that cognitive scientists took decades to formalise — that reading is not passive decoding but active world-simulation. You don't just understand "a stone fell into water" — you briefly become a witness to it, your brain constructing a ghost of the experience.
This is why great literature feels lived-in rather than merely read. And it is also — circling back to our LLM discussion — precisely what a transformer does not do.
It processes the tokens. It predicts relationships. It generates statistically coherent continuations.
But it has never heard a stone hit water.
And that absence — that lack of embodied simulation — may be the most fundamental difference between what you did reading that sentence, and what an LLM does.
No comments:
Post a Comment