Open Source Research

Plain Text as Infrastructure

Open source research treats the research paper not as a fixed object, but as structured, searchable, versioned source material.

Plain text is not a stylistic preference. It is infrastructure. It enables accessibility, searchability, reproducibility, and transparent iteration.

In the digital age, format is not cosmetic. It determines whether knowledge can be discovered, shared, audited, and improved.

Why PDF Is the Wrong Standard

PDF is the traditional default format for academic publishing. It is treated as a neutral, professional, universal format. In reality, it is meant to be printed on a physical page; it is one of the worst possible standards for publishing online.

A PDF is essentially a digital printout. It preserves layout, fonts, and pagination — features that made sense in the age of physical journals. But on the web, those features become limitations.

PDFs are heavy files, slow to load, and inefficient on low-bandwidth connections.
They do not adapt well to smartphones, tablets, or assistive technologies.
Search engines struggle to interpret their structure compared to web-native formats.
Internal structure — headings, tables, figures — is often visually clear but semantically opaque.

This is what PDF looks like under the skin:-

0000413077 00000 n 0000417138 00000 n 0000417332 00000 n

If you can't read it, unfortunately neither can a search engine. A PDF is optimised for printing. Open source research is optimised for reading, searching, linking, and reusing knowledge online.

What Is Plain Text?

Plain text is text stored without proprietary formatting or hidden binary structures. It can be opened, edited, and read by virtually any computer system. It is called 'plain' text because it is the simplest possible representation of letters as 'bytes' in a file. Save a file as plain text and you need never worry about version mismatch in a decade or two; plain text is a standard, it has remained identical for decades and it will be with us for many many more.

This is what plain text looks like:-

That's right, it's simply text, no magic here. Search engines love this!

Common plain text formats include:

.txt
.md
.html
.csv
.json
.tex
.typ
.xml

These formats are human-readable. They are transparent. They do not depend on a specific corporation’s software to be opened.

Why Plain Text Is King for Search

Search engines work best when text is structured and machine-readable. Plain text provides exactly that.

Unlike PDFs, which often embed text inside complex layout instructions, plain text exposes the content directly. Headings are headings. Lists are lists. Tables can be described structurally rather than visually.

This dramatically improves:

Search indexing
Accessibility tools such as screen readers
Text-to-speech systems
Machine analysis and citation extraction

When research is written in structured plain text and rendered as HTML, it becomes native to the web rather than an image of a printed page.

Why HTML Is an Ideal Online Format

HTML is plain text. It is lightweight, searchable, and universally supported. It adapts automatically to screen size. It supports hyperlinks, embedded data, scalable graphics, and semantic markup.

Properly structured HTML:

Loads quickly on any device
Scales to smartphones without reformatting
Allows deep linking to specific sections
Is fully indexable by search engines

HTML is not a compromise. It is the natural publishing format of the internet.

Why Plain Text Is Better for Writing

Word processors such as Microsoft Word use “what you see is what you get” (WYSIWYG) interfaces. They focus on visual layout rather than logical structure.

This encourages authors to think about fonts, spacing, and formatting instead of argument, structure, and clarity.

Writing in plain text (for example in Markdown, LaTeX, or Typst) separates content from presentation. The author focuses on headings, sections, citations, and logical structure. The final output format is compiled afterwards — HTML, EPUB, or even PDF -- you can generate any final format you wish, although of course plain text formats are always preferred for the internet.

This mirrors good programming practice: write clean source code, then compile or render it into the final form.

Version Control and Transparent Change

Plain text has another decisive advantage: it can be tracked line by line.

Version control systems such as Git compare changes between plain text files with precision. Every insertion, deletion, and modification is visible.

Binary formats like Word’s .docx files do not allow meaningful line-by-line diffing. Changes are hidden inside compressed internal structures. Long-term revision histories become opaque.

In an open source research model, versioning is central. Major releases, minor updates must be transparent and easily tracked. Plain text enables:

Full revision histories
Public change logs
Clear authorship of specific edits
'Forking' and collaborative improvement

Knowledge is not static. It evolves. Plain text makes that evolution visible.