- The article suggests that the em-dash “—” is becoming a recognizable hallmark of AI-generated text.
- Author Lia Erisson shares that after the launch of OpenAI’s ChatGPT in 2022, she realized her writing style resembled AI: long sentences, predictable structures, and excessive use of em-dashes.
- The emergence of “AI detectors” in schools and publishing has led many to change their writing style to avoid suspicion of using AI.
- AI detectors evaluate text based on word unpredictability (“perplexity”), sentence structure variation (“burstiness”), and various other statistical markers.
- The author began avoiding long sentences, semicolons, groups of three ideas, and em-dashes for fear of being flagged.
- According to the article, LLMs use em-dashes frequently due to two main reasons: training data and response optimization.
- Over 60% of GPT-3’s training data comes from web crawls—systems that collect public text from the Internet.
- LLMs learn by predicting the next word in a linguistic sequence, thereby absorbing writing styles and grammatical structures.
- If a structure like the em-dash appears enough in the data and is not adjusted post-training, it becomes an “instinct” of the model.
- Author Brent Csutoras once tried asking ChatGPT, Claude, and other models to stop using em-dashes but failed because this habit is deeply embedded in the AI’s output.
- Research by Freeburg shows that GPT-4.1 uses em-dashes 3.28 times more than average human writers in standard essays.
- According to this study, banning or limiting em-dashes through prompts is almost ineffective.
- One hypothesis suggests influence from chatbot content moderation in Africa, where English tends to use words like “delve” more frequently.
- However, the article notes that moderators primarily focus on removing toxic content rather than directly adjusting linguistic style.
- The author compares data between COCA—a modern mass media corpus—and OpenWebText, a dataset simulating AI training data.
- OpenWebText has an extremely high em-dash frequency, approximately 1,621.88 times per million words.
- Another hypothesis involves implicit bias: em-dashes are common in literature and long essays but appear less in daily communication like emails or messages.
- Because LLMs are trained heavily on long and academic articles, they absorb em-dash usage more than the average person.
- Beyond data factors, models like Claude or ChatGPT are optimized to generate “clear” responses, and the em-dash is particularly suited for explaining and breaking down complex ideas.
- The author believes that as humans increasingly dodge the em-dash to avoid being seen as using AI, future LLMs might decrease their usage accordingly.
- However, the article fears that the dread of being “caught by AI” is changing the nature of writing: to sound “human,” many must write less creatively.
📌 Conclusion: A fascinating paradox of the AI era: language models are trained on human writing, but now humans are changing their style to avoid being mistaken for AI. The em-dash has become a prime example, with GPT-4.1 using it 3.28 times more than usual, making it nearly impossible to eliminate entirely via prompts. More importantly, the author argues that the fear of AI detectors is impoverishing freedom of expression in writing, causing writers to avoid structures once considered signs of sophisticated and creative prose.
