YAML: The Complete Guide for 2026
YAML is everywhere in modern development. Kubernetes manifests, Docker Compose files, GitHub Actions workflows, Ansible playbooks, CI/CD pipelines — if you work with infrastructure or DevOps, you read and write YAML daily. Yet YAML is also one of the most misunderstood formats. Its deceptive simplicity hides a complex specification with subtle gotchas that have caused production outages and security vulnerabilities.
This guide covers YAML from the ground up: syntax fundamentals, data types, advanced features like anchors and multi-line strings, real-world usage patterns, security pitfalls, and the differences between YAML versions. Whether you are writing your first Kubernetes manifest or debugging a tricky indentation error, this is the reference you need.
What Is YAML and Its History
YAML stands for "YAML Ain't Markup Language" (a recursive acronym; it originally stood for "Yet Another Markup Language"). Created by Clark Evans, Ingy döt Net, and Oren Ben-Kiki, the first specification was released in 2001. YAML 1.0 arrived in 2004, YAML 1.1 in 2005, and the current version, YAML 1.2, was published in 2009.
The design goal was a data serialization format that is human-readable first and machine-parseable second. Unlike XML and JSON, which prioritize unambiguous machine parsing, YAML prioritizes how the document looks to a human editor. Indentation replaces braces. Quotes are optional for most strings. Comments are first-class. The result is a format that looks clean on screen but carries a complex specification underneath.
YAML vs JSON vs TOML
Before diving into syntax, it helps to understand where YAML fits relative to the other popular configuration and data formats.
| Feature | YAML | JSON | TOML |
|---|---|---|---|
| Comments | Yes (#) | No | Yes (#) |
| Multi-line strings | Yes (| and >) | No | Yes (triple quotes) |
| Anchors/references | Yes (& and *) | No | No |
| Indentation-sensitive | Yes | No | No |
| Multiple documents | Yes (---) | No | No |
| Implicit type coercion | Yes (gotcha) | No | No |
| Best for | Complex config, IaC | APIs, data exchange | App configuration |
Choose YAML when you need deep nesting, anchors for DRY configuration, multi-document files, or when the ecosystem requires it (Kubernetes, Ansible). Choose JSON for APIs and machine-to-machine data. Choose TOML for flat-to-moderate configuration where simplicity and unambiguous parsing matter most.
Basic Syntax: Scalars and Data Types
Every YAML document is built from three primitives: scalars (single values), sequences (lists), and mappings (key-value pairs). Let us start with scalars.
Strings
Strings are the most common value type and can be written several ways:
# Unquoted (plain) strings — no quotes needed for most text
name: John Doe
city: San Francisco
# Single-quoted — no escape sequences, literal backslashes
path: 'C:\Users\dev\config'
regex: '\d+\.\d+'
# Double-quoted — supports escape sequences like \n, \t, \"
greeting: "Hello,\nWorld!"
tab_separated: "col1\tcol2\tcol3"
The rule of thumb: use plain strings when there is no ambiguity, single quotes when you need literal backslashes, and double quotes when you need escape sequences or when the string starts with a special character.
Numbers, Booleans, and Null
# Integers
port: 8080
negative: -42
octal: 0o755 # YAML 1.2 octal notation
hex: 0xFF # hexadecimal
# Floats
pi: 3.14159
scientific: 6.626e-34
infinity: .inf
not_a_number: .nan
# Booleans (YAML 1.2: only true/false)
debug: true
verbose: false
# Null (multiple representations)
value1: null
value2: ~
value3: # empty value is also null
Pay special attention to booleans. In YAML 1.1, the words yes, no, on, off, y, n are all interpreted as booleans. YAML 1.2 restricts booleans to only true and false, but many parsers still use 1.1 rules. When in doubt, quote your strings.
Collections: Sequences and Mappings
Sequences (Lists)
Sequences are ordered lists of values, indicated by a dash and space:
# Block style sequence
fruits:
- apple
- banana
- cherry
# Nested sequences
matrix:
- [1, 2, 3]
- [4, 5, 6]
- [7, 8, 9]
# Sequence of mappings
employees:
- name: Alice
role: engineer
- name: Bob
role: designer
- name: Carol
role: manager
Mappings (Dictionaries)
Mappings are unordered collections of key-value pairs:
# Simple mapping
server:
host: 0.0.0.0
port: 8080
workers: 4
# Nested mappings
database:
primary:
host: db-primary.example.com
port: 5432
replica:
host: db-replica.example.com
port: 5432
Block Style vs Flow Style
YAML offers two notation styles. Block style uses indentation and newlines. Flow style uses braces and brackets, similar to JSON:
# Block style (preferred for configuration files)
server:
host: 0.0.0.0
port: 8080
features:
- logging
- metrics
- tracing
# Flow style (compact, JSON-like)
server: {host: 0.0.0.0, port: 8080, features: [logging, metrics, tracing]}
# You can mix styles — flow inside block is common
endpoints:
- {path: /api/users, method: GET}
- {path: /api/users, method: POST}
- {path: /api/health, method: GET}
Block style is more readable for configuration files. Flow style is useful for short, self-contained values that would waste vertical space in block style. Many Kubernetes examples use flow style for label selectors and small inline objects.
Multi-Line Strings: Literal and Folded Blocks
YAML's multi-line string handling is one of its most powerful and most confusing features. There are two block scalar styles, each with modifiers for chomping trailing newlines.
Literal Block Scalar (|)
Preserves newlines exactly as written. Each line break in the YAML becomes a line break in the parsed string:
# Literal block — preserves line breaks
script: |
#!/bin/bash
echo "Starting deployment"
kubectl apply -f manifests/
echo "Deployment complete"
# The parsed value is:
# "#!/bin/bash\necho \"Starting deployment\"\nkubectl apply -f manifests/\necho \"Deployment complete\"\n"
Folded Block Scalar (>)
Folds newlines into spaces, turning a paragraph into a single long line. Empty lines become actual newlines:
# Folded block — newlines become spaces
description: >
This is a long description that spans
multiple lines in the YAML file but will
be parsed as a single paragraph with
spaces replacing the line breaks.
# The parsed value is:
# "This is a long description that spans multiple lines..."
Chomping Indicators
Control what happens with the trailing newline after the block content:
# Clip (default): single trailing newline
clip: |
text here
# Strip (-): no trailing newline
strip: |-
text here
# Keep (+): preserve all trailing newlines
keep: |+
text here
# (two trailing newlines preserved)
The |- (literal + strip) combination is especially common in Kubernetes ConfigMaps and CI/CD pipelines where you want exact content without a trailing newline.
Anchors and Aliases
Anchors (&) and aliases (*) let you define a value once and reuse it throughout the document. This is YAML's mechanism for DRY (Don't Repeat Yourself) configuration:
# Define an anchor with &
defaults: &default_settings
timeout: 30
retries: 3
log_level: info
# Reference it with *
development:
<<: *default_settings
log_level: debug # Override one value
staging:
<<: *default_settings
timeout: 60 # Override one value
production:
<<: *default_settings
retries: 5
timeout: 120
The << key is the merge key, which merges the anchored mapping into the current mapping. Keys defined after the merge override the anchored values. This pattern is used extensively in CI/CD configurations to avoid repeating common job settings.
# Anchors on individual values
max_connections: &max_conn 100
database:
pool_size: *max_conn # Resolves to 100
max_overflow: *max_conn # Also 100
Important: Anchors and aliases are a document-level feature. They cannot reference values across separate YAML documents (separated by ---), and they cannot reference values in other files.
Tags and Custom Types
YAML tags explicitly declare the type of a value, overriding the parser's automatic type detection:
# Force a value to be a specific type
explicit_string: !!str 123 # String "123", not integer
explicit_int: !!int "456" # Integer 456, not string
explicit_float: !!float "3.14" # Float 3.14
explicit_bool: !!bool "true" # Boolean true
explicit_null: !!null "" # Null
# Useful for the Norway problem
country_code: !!str NO # String "NO", not boolean false
version: !!str 1.0 # String "1.0", not float
Tags are essential when you need to prevent YAML's implicit type resolution from misinterpreting your data. The !!str tag is the most commonly used, typically to force numeric-looking values to remain as strings.
Multiple Documents in One File
YAML supports multiple documents in a single file, separated by --- (document start) and optionally terminated by ... (document end):
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_URL: postgres://localhost/mydb
---
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: web
ports:
- port: 80
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-deployment
spec:
replicas: 3
template:
spec:
containers:
- name: web
image: myapp:latest
This is a fundamental pattern in Kubernetes, where a single YAML file often contains multiple related resources. The --- separator tells the parser to start a fresh document. You can pipe such files through kubectl apply -f and each document is processed independently.
YAML in Practice
Kubernetes Manifests
Kubernetes is the single biggest consumer of YAML in the development ecosystem:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
labels:
app: api
tier: backend
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myregistry/api:v2.1.0
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
Docker Compose
services:
web:
build: .
ports:
- "8080:8080"
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://db:5432/myapp
depends_on:
db:
condition: service_healthy
deploy:
replicas: 2
resources:
limits:
memory: 512M
db:
image: postgres:16
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: myapp
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
volumes:
pgdata:
GitHub Actions
name: CI Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: npm
- run: npm ci
- run: npm test
- run: npm run build
Ansible Playbooks
---
- name: Configure web servers
hosts: webservers
become: true
vars:
http_port: 80
max_clients: 200
tasks:
- name: Install nginx
apt:
name: nginx
state: present
update_cache: true
- name: Copy nginx config
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/sites-available/default
notify: restart nginx
- name: Ensure nginx is running
service:
name: nginx
state: started
enabled: true
handlers:
- name: restart nginx
service:
name: nginx
state: restarted
Common Gotchas and Pitfalls
The Norway Problem
In YAML 1.1, the following bare words are all interpreted as booleans:
# YAML 1.1 boolean values (all of these are booleans, NOT strings):
# true, True, TRUE, yes, Yes, YES, on, On, ON, y, Y
# false, False, FALSE, no, No, NO, off, Off, OFF, n, N
# This means country codes break:
countries:
- GB # string "GB"
- US # string "US"
- NO # BOOLEAN false (not the string "NO")!
- FR # string "FR"
# Fix: quote the values
countries:
- "GB"
- "US"
- "NO" # Now correctly a string
- "FR"
Indentation Errors
YAML uses spaces only, never tabs. Inconsistent indentation is the most common source of YAML parse errors:
# WRONG — mixing indent levels
server:
host: localhost
port: 8080 # Error: unexpected indentation
# WRONG — tabs instead of spaces
server:
host: localhost # Error: tabs are not allowed
# CORRECT — consistent 2-space indentation
server:
host: localhost
port: 8080
Unquoted Strings That Look Like Other Types
# These are NOT strings without quotes:
version: 1.0 # Float 1.0, not string "1.0"
version: 1.2.3 # String "1.2.3" (not a valid number)
time: 12:30 # Sexagesimal number 750 in YAML 1.1!
zipcode: 01onal # String (not a valid number)
zipcode: 01onal # String
zipcode: 00501 # Octal 321 in YAML 1.1, string in 1.2
# Always quote values that should remain strings:
version: "1.0"
time: "12:30"
zipcode: "00501"
Colon in Values
# A colon followed by a space starts a mapping value
# This breaks:
message: Error: something went wrong # Parsed as key "message" = "Error"
# Fix: quote the entire value
message: "Error: something went wrong"
# Or use a block scalar
message: |
Error: something went wrong
YAML Security: safe_load vs load
YAML has a well-documented security vulnerability in many parser implementations. The full YAML specification allows tags that can instantiate arbitrary objects, which means loading untrusted YAML can execute arbitrary code.
# DANGEROUS — this executes a system command in Python's PyYAML
!!python/object/apply:os.system
args: ['rm -rf /']
# Another attack vector
!!python/object/apply:subprocess.check_output
args: [['cat', '/etc/passwd']]
The fix is simple but critical: always use safe loading functions.
# Python — ALWAYS use safe_load
import yaml
# DANGEROUS — never use with untrusted input
# data = yaml.load(content, Loader=yaml.FullLoader)
# SAFE — restricts to basic YAML types
data = yaml.safe_load(content)
# For writing
output = yaml.safe_dump(data)
// Node.js — js-yaml defaults to safe mode
const yaml = require('js-yaml');
const data = yaml.load(content); // Safe by default since js-yaml v4
// yaml.load(content, { schema: yaml.DEFAULT_SCHEMA }) // Explicitly safe
# Ruby — use safe_load
require 'yaml'
data = YAML.safe_load(content) # Safe
# data = YAML.load(content) # Dangerous in older Ruby versions
YAML 1.2 vs 1.1 Differences
YAML 1.2 (2009) cleaned up several problematic behaviors from 1.1. The most important changes:
Booleans: YAML 1.2 only recognizes true and false as booleans. The 1.1 values yes, no, on, off, y, n are treated as plain strings. This fixes the Norway problem.
Octals: YAML 1.2 uses 0o777 for octal (matching modern language conventions). YAML 1.1 used 0777 (leading zero), which was confusing since most people think of leading-zero numbers as decimal.
JSON compatibility: YAML 1.2 is a strict superset of JSON. Every valid JSON document is also valid YAML 1.2. This was not guaranteed in 1.1.
Sexagesimal numbers: YAML 1.1 interprets 12:30 as the number 750 (12 * 60 + 30). YAML 1.2 treats it as the string "12:30".
Parser reality: Despite YAML 1.2 being published in 2009, many widely-used parsers still default to 1.1 behavior. Python's PyYAML uses 1.1 rules. Use ruamel.yaml or strictyaml for 1.2 compliance in Python. Go's gopkg.in/yaml.v3 and Node.js's js-yaml v4 support YAML 1.2.
YAML Processing in Different Languages
Python
import yaml # PyYAML — YAML 1.1
# pip install pyyaml
# Read YAML
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Write YAML
with open("output.yaml", "w") as f:
yaml.safe_dump(config, f, default_flow_style=False)
# For YAML 1.2 compliance, use ruamel.yaml:
# pip install ruamel.yaml
from ruamel.yaml import YAML
ry = YAML()
with open("config.yaml") as f:
config = ry.load(f)
Node.js
// npm install js-yaml
const yaml = require('js-yaml');
const fs = require('fs');
// Read YAML
const config = yaml.load(fs.readFileSync('config.yaml', 'utf8'));
// Write YAML
const output = yaml.dump(config, { indent: 2, lineWidth: 120 });
fs.writeFileSync('output.yaml', output);
Go
// go get gopkg.in/yaml.v3
package main
import (
"os"
"gopkg.in/yaml.v3"
)
type Config struct {
Server struct {
Host string `yaml:"host"`
Port int `yaml:"port"`
} `yaml:"server"`
}
func main() {
data, _ := os.ReadFile("config.yaml")
var config Config
yaml.Unmarshal(data, &config)
}
Ruby
require 'yaml'
# Read YAML (safe_load for untrusted input)
config = YAML.safe_load(File.read('config.yaml'))
# With permitted classes for custom types
config = YAML.safe_load(
File.read('config.yaml'),
permitted_classes: [Date, Time]
)
# Write YAML
File.write('output.yaml', config.to_yaml)
Frequently Asked Questions
What is YAML and what does it stand for?
YAML stands for "YAML Ain't Markup Language" (a recursive acronym). It is a human-readable data serialization format used for configuration files, data exchange, and infrastructure-as-code. YAML uses indentation to represent structure, supports comments, and is the standard format for Kubernetes, Docker Compose, GitHub Actions, and Ansible.
What is the difference between YAML and JSON?
YAML supports comments, multi-line strings, anchors/aliases for reuse, and uses indentation instead of braces. JSON uses braces and brackets, requires quoted keys, has no comments, and is more compact for machine-to-machine exchange. YAML 1.2 is a superset of JSON, meaning any valid JSON is also valid YAML. YAML is preferred for human-edited configuration; JSON is preferred for APIs and data interchange.
What is the Norway problem in YAML?
The Norway problem is when YAML 1.1 interprets the country code "NO" as boolean false instead of the string "NO". YAML 1.1 treats many bare words as booleans: yes/no, on/off, true/false, y/n. The fix is to always quote strings that could be misinterpreted. YAML 1.2 resolved this by only recognizing true/false as booleans, but many parsers still default to 1.1 behavior.
Why should I use yaml.safe_load instead of yaml.load in Python?
Python's yaml.load() function can execute arbitrary code through YAML tags like !!python/object/apply:os.system. This is a critical security vulnerability when loading YAML from untrusted sources. yaml.safe_load() restricts parsing to safe standard types (strings, numbers, lists, dicts) and prevents code execution. Always use yaml.safe_load() unless you need custom object deserialization from trusted input only.
What is the difference between YAML 1.1 and YAML 1.2?
YAML 1.2 (2009) improved several areas: booleans are limited to true/false only (no more yes/no/on/off), octals use 0o prefix, sexagesimal numbers like 12:30 are treated as strings, and JSON compatibility is guaranteed. Many parsers like PyYAML still use 1.1 rules. Use ruamel.yaml for Python, gopkg.in/yaml.v3 for Go, or js-yaml v4 for Node.js for YAML 1.2 support.
Conclusion
YAML is the dominant configuration format for cloud-native infrastructure, CI/CD pipelines, and DevOps automation. Its readability makes it excellent for configuration that humans maintain, and its features like anchors, multi-line strings, and multi-document support address real needs in complex configurations.
However, YAML's implicit type coercion, indentation sensitivity, and security implications mean you must approach it with awareness. Quote strings that could be misinterpreted. Always use safe loading functions. Validate your YAML before deploying it. Understand whether your parser uses YAML 1.1 or 1.2 rules.
Master these fundamentals and YAML becomes a reliable, expressive tool rather than a source of mysterious bugs.