Open Source Research
Why Plain Text Is Better for Writing
The Problem with word processors
Word processors such as Microsoft Word use a “what you see is what you get” (WYSIWYG) interface. The screen attempts to display the final printed form while you are writing.
This is absolutely a simpler model for the writer; you don't need to think about what is happening in your docx file when you make a heading, or bold text, or change it's colour. All that typographic complexity is happening behind the scenes:- this encourages authors to focus on content. Sure, you can fiddle around with fonts, spacing, margins, indentation, page breaks, visual alignment, etc. But this isn't needed for the vast majority of writing, which is generally semi-formal, rather than publication-grade. Most of the time, when you are writing a Word document, it is to communicate something to a colleague, or write a draft of an article, or set out an asignment for students, or a marking rubric, etc. When the stakes get high, and you really need professional-looking presentation for your writing, we often turn to PDF, because it sets the typography, spacing, etc in stone. Once you make a PDF it is forever the same. In fact, publishers spend a lot of money on professional software suites to convert the docx file you give them into something that is consistent, functional, and beautiful. This is another part of the problem.
So word processors absolutely fill a niche; but they are not made for professional writing. They inhabit a grey zone between sending a colleague an email, which is ephemeral, and professional publishing, which is stable, but often inconvenient to create and read. So, why is Word so common in the arts, social sciences, and humanities? Well, the the most important reason for it's popularity is that it is already the standard tool. If you write a document in any other format, colleagues may send you an exhasperated email saying they can't open the file; please send it in Word! Students may pester you with emails because they can't open the format; it will just be ignored. Part of this problem is Microsoft Windows. Windows OS decided many years ago that if file format endings weren't of a few specific types, they would refuse to open them, even if they are plain text and can easily to opened and edited in any text editor like notepad++. This creates an artificial barrier to entry for the easiest possible text interface standard: plain text.
Secondly, I think many scholars in these fields default to Microsoft Word because many academics in the 'humanities' (broadly defined) are either actively or passively anti-technology. I know because I was also. I took a perverse pride in being bad at technology partly because it showed how seriously I took the ideas and concepts, rather than the surface-level detail of software implementation. Partly this is a matter of personal interest: scholars in the humanities are often not 'technical' people; they prefer to learn human languages rather than programming languages. They were often bad at mathematics in school, and good at English (or whatever your mother tongue is; almost every country's curriculum has the 'native language' class where we learn writing and basic literary analysis). Perhaps it is 'sour grapes'; we devalue that which we cannot do well. In order to better control our environment, we surround ourselves with what is comfortable, and reject that which challenges us. In any event, the result is that humanities scholars are amongst the least 'technically minded' of anyone. Perhaps that is why we studiously demarcate 'quantitative' from 'qualitative' scholarship, as if by doing so we may design a world where we may never be uncomfortable
So let's take a closer look at what a docx file actually is, and why it is not suitable as a core technology in open source research.
What a .docx File Actually Is
A .docx file is not plain text. It is a compressed archive containing:
- Multiple XML files (these are themselves actually plain text)
- Embedded style definitions
- Font instructions
- Metadata
- Binary relationships between elements
If you rename a .docx file to .zip and open it,
you will see a complex directory structure of internal components.
Even a short essay can contain thousands of lines of hidden formatting instructions.
This is what a docx document actually is:- it's a zip file with at least half a dozen different nested files inside!
essay/
├── [Content_Types].xml
├── _rels/
│ └── .rels
├── docProps/
│ ├── app.xml
│ └── core.xml
└── word/
├── document.xml
├── styles.xml
├── numbering.xml
├── settings.xml
├── fontTable.xml
├── webSettings.xml
├── theme/
│ └── theme1.xml
└── _rels/
└── document.xml.rels
Why .docx Is Heavy and Slow
Because a .docx file stores layout instructions alongside content,
it carries far more structural overhead than a plain text file. This says: "This is a test sentence." You can see that the content is wrapped in layers and layers of structured code-like references. This is just a fraction of a part of the complexity of Word!
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>This is a test sentence.</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
- It must be parsed and rendered by specialised software.
- It embeds style layers that may conflict or accumulate invisibly.
- It is not easily readable without compatible applications.
- It does not degrade gracefully across devices.
A 50KB Markdown file may expand to several hundred kilobytes or more as a .docx file
due to styling metadata alone.
This added complexity does not improve argument quality. It merely encodes visual formatting.
Separating Content from Presentation
Plain text systems separate writing from layout. The author writes structured content. Presentation is applied later through compilation or rendering.
The source file remains lightweight, readable, and adaptable. From the same source, one can generate HTML, EPUB, PDF, or other formats.
Examples of Plain Text Writing
1. Simple .txt
The most basic form is a plain .txt file:
The Structure of Scientific Revolutions 1. Introduction Scientific paradigms shift when anomalies accumulate...
No hidden formatting. No embedded styling. Just text.
2. Markdown (.md)
Markdown adds minimal structural markers while remaining human-readable:
# The Structure of Scientific Revolutions ## Introduction Scientific paradigms shift when anomalies accumulate... - Observation - Hypothesis - Crisis - Revolution
The symbols (#, ##, -) indicate structure.
They are visible and intuitive. The file remains readable even without rendering.
3. LaTeX
LaTeX is a structured typesetting system widely used in mathematics and physics. It is a standard method of writing academic research in many fields of natural science. This is because it deals with mathematical equations much better than word processors:
\section{Introduction}
Scientific paradigms shift when anomalies accumulate.
\begin{equation}
E = mc^2
\end{equation}
LaTeX emphasizes logical structure and mathematical clarity. The author writes source; the system compiles it into formatted output.
4. Typst
Typst is a modern alternative to LaTeX, designed to be cleaner and more readable:
= Introduction Scientific paradigms shift when anomalies accumulate. $ E = mc^2 $
Again, the author writes structured source. The final layout is generated separately.
Writing as Source Code
In an open source research model, the research paper is treated as source code. It is written in structured plain text, versioned, reviewed, and compiled into multiple outputs.
The intellectual work remains transparent. The presentation layer is flexible.
Writing becomes cleaner, more portable, more durable, and more adaptable to the web.