DevToys Web Pro iconDevToys Web ProBlog
Fordítás: LocalePack logoLocalePack
Értékeljen minket:
Próbáld ki a böngészőbővítményt:
← Back to Blog

Roman Numerals Guide: Rules, Conversion, and Why They Still Matter

9 min read

Roman numerals appear on clock faces, in film credits, on monarchs (Charles III, Louis XIV), in Super Bowl logos, and at the end of copyright notices: MMXXVI. They have outlasted the Roman Empire by fifteen centuries and show no sign of disappearing. If you work with publishing metadata, legal documents, or entertainment data, you will eventually need to parse or generate them correctly. Use the Roman Numeral Converter to follow the examples in this guide.

The Seven Basic Symbols

The entire Roman numeral system is built from seven letters. Each maps to a fixed Arabic value:

SymbolValueOrigin
I1One finger tally
V5Open hand (V shape between thumb and fingers)
X10Two crossed hands, or two V shapes
L50Half of C (an early form)
C100Latin centum (hundred)
D500Half of M (an early form)
M1000Latin mille (thousand)

Notice the pattern: each symbol is either five times or twice the previous one. This alternating structure — 1, 5, 10, 50, 100, 500, 1000 — keeps the number of symbols per digit small.

The Additive Rule

When a symbol is followed by a symbol of equal or lesser value, the values are added together. Reading left to right, you accumulate:

  • III = 1 + 1 + 1 = 3
  • VIII = 5 + 1 + 1 + 1 = 8
  • XXX = 10 + 10 + 10 = 30
  • LXXX = 50 + 10 + 10 + 10 = 80
  • DCCC = 500 + 100 + 100 + 100 = 800

The strict modern standard allows each symbol to repeat at most three times consecutively. This is why 4 is written IV rather than IIII — though clock faces famously use IIII for aesthetic balance, a tradition traced back to royal clockmakers who reportedly wrote it that way to avoid the appearance of the abbreviation for the Roman god Jupiter (IVPITER). Whether the story is true is debated; the practice is not.

The Subtractive Rule

When a symbol of lesser value appears before a symbol of greater value, the lesser is subtracted from the greater. This is called subtractive notation, and the rules are strict:

  • Only I, X, and C can be used subtractively. V, L, and D cannot.
  • I can only precede V (making 4) or X (making 9).
  • X can only precede L (making 40) or C (making 90).
  • C can only precede D (making 400) or M (making 900).

The six subtractive pairs in full:

RomanValueRule
IV4I before V
IX9I before X
XL40X before L
XC90X before C
CD400C before D
CM900C before M

A common mistake is treating any smaller-before-larger as valid — for instance writing IL for 49. That is incorrect. 49 is XLIX (XL + IX). Similarly, IC is not 99; the correct form is XCIX.

Worked Example: MCMLXXXIV = 1984

The year 1984 is one of the most cited examples because it involves every subtractive pair from the hundreds and tens. Breaking it down symbol by symbol:

TokenValueRunning total
M+10001000
CM+900 (C before M: 1000 − 100)1900
L+501950
XXX+30 (10 + 10 + 10)1980
IV+4 (I before V: 5 − 1)1984

The algorithm: scan left to right. If the current symbol is less than the next symbol, add the difference (next − current) and skip two symbols. Otherwise add the current symbol's value and advance one.

Limits of the System

The Roman numeral system has three hard constraints that any implementation must handle:

  • No zero. Latin had no concept of zero as a number. You cannot represent 0 in standard Roman numerals. Attempting to convert 0 should return an error or empty string, not "O" (which is a letter, not a symbol).
  • No negatives. The system has no minus sign. Negative inputs are undefined.
  • Maximum 3999. With standard symbols, the largest expressible number is MMMCMXCIX (3999). Four M's in a row (MMMM) is non-standard in strict notation, though widely accepted in informal use. Some implementations allow it; most strict validators reject it.

For numbers beyond 3999, medieval manuscripts used a vinculum — a horizontal bar over a symbol to multiply its value by 1000. = 5,000, = 10,000, = 1,000,000. This convention is rarely used in modern contexts and most converters do not support it. If you need to represent large Roman numerals programmatically, document the convention you follow.

Common Conversion Bugs

Roman numeral converters have a surprisingly rich bug surface. These are the failures most frequently encountered in production code:

  • Accepting IIII as valid. Many naive parsers sum all symbols left to right and accept IIII = 4. A strict converter should reject it. Decide upfront whether your converter is strict (rejects IIII) or lenient (accepts it). Clock-face data sources often use IIII.
  • Rejecting MMMM. The symmetric problem: some strict validators reject 4000 even when the caller explicitly needs it. If your domain goes above 3999 (e.g., year metadata through 4000+), either document the limit or extend to MMMM.
  • Lowercase input. Roman numerals in the wild appear in both cases — film credits often use lowercase (e.g., mmxxvi). A robust parser should either normalize to uppercase before parsing or explicitly handle lowercase. Mixing case within a single numeral (e.g., Mcmxcix) should be normalized too.
  • Lookalike characters. Cyrillic has characters visually identical to Roman numeral letters: Cyrillic "С" (U+0421) looks like Latin "C" (U+0043); Cyrillic "І" (U+0406) looks like Latin "I" (U+0049). In text scraped from Eastern European sources, these substitutions cause silent parse failures. Validate that all characters are in the Latin block before parsing.
  • Invalid subtractive pairs. Inputs like IL (49), VX (5 before 10), LC (50 before 100), or DM (500 before 1000) are not valid standard Roman numerals. A strict converter must detect and reject them.
  • Off-by-one at 0 and 4000. Boundary testing — 0, 1, 3999, 4000 — catches most range errors.

Practical Use Cases Today

Roman numerals persist in specific domains that value tradition, prestige, or visual distinctiveness:

  • Clock faces. Analog clocks commonly use Roman numerals. The IIII convention for 4 is nearly universal on clock faces, while the rest of the numerals follow standard form. A clock-face parser must accept IIII.
  • Book front matter. Prefaces, forewords, and tables of contents in traditionally typeset books use Roman numerals for page numbers (i, ii, iii, iv...) before the main text begins. PDF processing tools often need to handle this numbering scheme separately from arabic-numbered body pages.
  • Monarchs and popes. "Charles III" and "Benedict XVI" are stored and searched as text. Parsing the ordinal requires handling Roman numerals embedded in a name string.
  • Super Bowl and Olympics. The NFL used Roman numerals for Super Bowls from Super Bowl I (1967) through Super Bowl L (2016), then switched to "50" for the golden anniversary before reverting to LI onward. Databases of sports events must store and convert both representations.
  • Copyright notices. Film and television production companies append the year of first broadcast in Roman numerals — MMXXVI for 2026. This is a legal tradition in some jurisdictions. Parsing copyright metadata from video files or scraped HTML frequently requires a converter.
  • Movie and series sequels. Franchises like Rocky, Star Wars, and Fast and Furious use Roman numerals for installment numbers. Canonical title matching in entertainment databases must normalize these.

Code Examples

The following implementations cover both conversion directions with strict validation. These are production-quality starting points, not minimal sketches.

JavaScript / TypeScript

// Arabic to Roman (strict, 1–3999)
const VAL_SYM: [number, string][] = [
  [1000, "M"], [900, "CM"], [500, "D"], [400, "CD"],
  [100,  "C"], [90,  "XC"], [50,  "L"], [40,  "XL"],
  [10,   "X"], [9,   "IX"], [5,   "V"], [4,   "IV"],
  [1,    "I"],
];

function toRoman(n: number): string {
  if (!Number.isInteger(n) || n < 1 || n > 3999) {
    throw new RangeError(`Roman numerals are defined for integers 1–3999, got ${n}`);
  }
  let result = "";
  for (const [value, symbol] of VAL_SYM) {
    while (n >= value) {
      result += symbol;
      n -= value;
    }
  }
  return result;
}

// Roman to Arabic (strict validator)
const ROMAN_VALUES: Record<string, number> = {
  I: 1, V: 5, X: 10, L: 50, C: 100, D: 500, M: 1000,
};

// Validates strict standard form before parsing
const VALID_ROMAN = /^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$/;

function fromRoman(s: string): number {
  const upper = s.toUpperCase();
  if (!VALID_ROMAN.test(upper) || upper.length === 0) {
    throw new Error(`Invalid Roman numeral: "${s}"`);
  }
  let result = 0;
  for (let i = 0; i < upper.length; i++) {
    const curr = ROMAN_VALUES[upper[i]];
    const next = ROMAN_VALUES[upper[i + 1]] ?? 0;
    if (curr < next) {
      result += next - curr;
      i++; // skip the next symbol, already consumed
    } else {
      result += curr;
    }
  }
  return result;
}

// Usage
console.log(toRoman(1984));    // "MCMLXXXIV"
console.log(fromRoman("MMXXVI")); // 2026

The regex VALID_ROMAN encodes all the structural rules: M repeats 0–3 times, hundreds can be CM/CD/D+0–3 C's, tens can be XC/XL/L+0–3 X's, and units follow the same pattern. It rejects IIII, IL, VX, and every other non-standard form in a single check.

Python

import re

VAL_SYM = [
    (1000, "M"), (900, "CM"), (500, "D"), (400, "CD"),
    (100,  "C"), (90,  "XC"), (50,  "L"), (40,  "XL"),
    (10,   "X"), (9,   "IX"), (5,   "V"), (4,   "IV"),
    (1,    "I"),
]

ROMAN_VALUES = {"I": 1, "V": 5, "X": 10, "L": 50,
                "C": 100, "D": 500, "M": 1000}

VALID_ROMAN = re.compile(
    r"^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"
)

def to_roman(n: int) -> str:
    if not isinstance(n, int) or not (1 <= n <= 3999):
        raise ValueError(f"Roman numerals are defined for integers 1–3999, got {n}")
    result = []
    for value, symbol in VAL_SYM:
        while n >= value:
            result.append(symbol)
            n -= value
    return "".join(result)

def from_roman(s: str) -> int:
    upper = s.upper()
    if not upper or not VALID_ROMAN.match(upper):
        raise ValueError(f"Invalid Roman numeral: {s!r}")
    result = 0
    chars = list(upper)
    i = 0
    while i < len(chars):
        curr = ROMAN_VALUES[chars[i]]
        nxt  = ROMAN_VALUES[chars[i + 1]] if i + 1 < len(chars) else 0
        if curr < nxt:
            result += nxt - curr
            i += 2
        else:
            result += curr
            i += 1
    return result

# Usage
print(to_roman(1984))       # MCMLXXXIV
print(from_roman("MMXXVI")) # 2026

Regex Validation Only

If you only need to check whether a string is a valid Roman numeral without converting it, the regex alone is sufficient:

const VALID_ROMAN = /^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$/i;

// true — standard form
VALID_ROMAN.test("MCMLXXXIV") // 1984
VALID_ROMAN.test("MMXXVI")    // 2026
VALID_ROMAN.test("XIV")       // 14

// false — non-standard or invalid
VALID_ROMAN.test("IIII")      // rejected
VALID_ROMAN.test("IL")        // rejected
VALID_ROMAN.test("VX")        // rejected
VALID_ROMAN.test("MMMM")      // rejected (4000 is out of range)
VALID_ROMAN.test("")           // rejected (empty string)

Why Roman Numerals Persist

From a purely functional standpoint, Arabic numerals are strictly better: they support zero, arbitrary magnitude, and decimal arithmetic without special cases. Roman numerals persist for reasons that have nothing to do with arithmetic efficiency:

  • Prestige and gravitas. MMXXVI looks more authoritative than 2026 in a legal disclaimer or film credit. The letterforms carry cultural weight accumulated over millennia.
  • Disambiguation. In a document that already uses Arabic numerals for page counts, version numbers, and footnote references, using Roman numerals for a different sequence (front matter, act numbers, appendices) prevents visual collision.
  • SEO-visible dates. Award ceremonies and sports events that appear in search results benefit from having the year embedded in a recognizable format. Oscar ceremonies are numbered with Roman ordinals; the number appears in news headlines and structured data alike.
  • Tradition-locked domains. Legal deposit requirements, clock manufacturing standards, and liturgical typesetting have entrenched Roman numerals through inertia. No individual actor has enough incentive to change what everyone else in the domain already knows.

For a developer, the practical lesson is: you will encounter Roman numerals in data, and your parser needs to handle both strict and lenient inputs with clear, documented behavior. Use the Roman Numeral Converter to verify edge cases as you build. For related numeral system work, see the guides on number base conversion and data converters.


Convert between Roman and Arabic numerals instantly in your browser with the Roman Numeral Converter — no server, no data leaving your machine.