Download Sample Data Files — Free CSV, XML, JSON & YAML for Testing

Download free sample data files for development, testing, and data processing. Our library covers four major data formats — CSV, XML, JSON, and YAML — with files ranging from basic structures to 100,000-row datasets, real-world API responses, Kubernetes manifests, and deliberately malformed files for error handling testing. Every file is free — no sign-up required.

Choose your data format

Sample XML Files

The standard markup language for structured data interchange, configuration, and enterprise systems. Our XML collection includes 11 files covering basic structures, deep nesting, namespace declarations, XSD schema validation, XSLT transformations, CDATA sections, processing instructions, Unicode content, and a 5 MB large dataset. Ideal for testing DOM/SAX/StAX parsers, XPath queries, XSLT processors, and schema validators.

Quick Downloads	Size	Features
Basic structure	1 KB	Simple nested elements
Namespaces	1 KB	Multiple xmlns declarations
XSLT pair (ZIP)	3 KB	XML + XSLT stylesheet
International characters	3 KB	CJK, Arabic, Cyrillic, emoji
Large dataset	5 MB	Performance testing

View all 11 XML sample files →

Sample CSV Files

The universal format for tabular data exchange. Our CSV collection includes 10 files with datasets from 100 to 100,000 rows, covering comma-delimited, tab-delimited (TSV), and pipe-delimited variants. Includes mixed data types, quoted fields with embedded commas and newlines, Unicode characters, files with no header row, empty/null field patterns, and deliberately malformed data. Ideal for testing spreadsheet imports, database loading, ETL pipelines, and data analysis tools.

Quick Downloads	Rows	Delimiter
Basic data set	100	Comma
Quoted fields	200	Comma
Tab-separated (TSV)	200	Tab
Empty fields	150	Comma
Large dataset	100,000	Comma

View all 10 CSV sample files →

Sample JSON Files

The dominant format for REST APIs, web applications, and configuration. Our JSON collection includes 11 files with simple objects, deeply nested structures, paginated API responses, error response objects, GeoJSON geospatial data, JSON Lines (JSONL) for streaming, a 10,000-record large dataset, and malformed JSON for error handling. Ideal for testing API clients, JSON parsers, config loaders, and data transformation pipelines.

Quick Downloads	Size	Use Case
Simple object	1 KB	Basic parsing
API response (paginated)	5 KB	REST API client testing
GeoJSON	8 KB	Map/geospatial testing
JSON Lines (JSONL)	70 KB	Log parsing, streaming
Large dataset	3.7 MB	Performance testing

View all 11 JSON sample files →

Sample YAML Files

The standard configuration format for DevOps and infrastructure-as-code. Our YAML collection includes 10 files with real-world examples — Kubernetes Deployments, Docker Compose stacks, GitHub Actions CI/CD workflows, Ansible playbooks, OpenAPI/Swagger API specs, plus advanced features like anchors/aliases, multi-document files, and complex nested data. Ideal for testing YAML parsers, linters, config loaders, and infrastructure tools.

Quick Downloads	Size	Platform / Feature
Basic config	2 KB	App settings
Kubernetes deployment	3 KB	Container orchestration
Docker Compose	3 KB	Multi-container apps
GitHub Actions	4 KB	CI/CD pipeline
OpenAPI spec	9 KB	API documentation

View all 10 YAML sample files →

Data format comparison

Not sure which format to use for testing? This comparison covers the key differences:

Format	Type	Structure	Best For
CSV	Tabular	Rows and columns, flat structure	Spreadsheets, database import, data analysis, ETL
XML	Hierarchical markup	Tags, attributes, namespaces	Enterprise systems, SOAP APIs, config, document markup
JSON	Hierarchical data	Key-value pairs, arrays, objects	REST APIs, web apps, config, data exchange
YAML	Hierarchical config	Indentation-based, human-readable	Kubernetes, Docker, CI/CD, infrastructure-as-code

Which format should you download?

The right format depends on your testing scenario:

Testing a REST API client? Use JSON — our paginated API response and error response files simulate real REST endpoints.
Testing spreadsheet or database import? Use CSV — available in comma, tab, and pipe-delimited variants with up to 100,000 rows.
Testing XML parsers or XSLT? Use XML — includes namespaces, schemas, CDATA, XSLT pairs, and a 5 MB performance test file.
Testing DevOps config or infrastructure tools? Use YAML — real-world Kubernetes, Docker Compose, GitHub Actions, and Ansible examples.
Testing error handling? Every format includes a deliberately malformed file — malformed CSV, malformed XML, malformed JSON, and malformed YAML.
Testing Unicode and internationalization? Our CSV, XML, and JSON collections each include files with multi-language content (CJK, Arabic, Cyrillic, Hindi, and more).
Performance and stress testing? Download our large datasets — 100K-row CSV (10 MB), 5 MB XML, or 10K-record JSON (3.7 MB).

Sample files in other categories

Need files beyond data formats? Browse our other sample file categories:

Sample audio files — MP3, WAV, FLAC, OGG, M4A, and WMA in various bit rates
Sample PDF files — Documents from 1 to 100 pages
Sample DOCX files — Word documents with text, tables, and formatting
Sample XLSX files — Excel spreadsheets with data and formulas
Sample JPG files — JPEG images in multiple resolutions
Sample MP4 files — Video files for playback and transcoding testing
Sample SQL files — Database scripts for SQL testing
Sample ZIP files — Archive files for compression testing

FAQs about sample data files

Which data format is best for testing?

It depends on what you’re building. CSV is best for testing tabular data workflows — spreadsheet imports, database bulk loading, and data analysis tools. JSON is the standard for REST APIs, frontend data binding, and modern web applications. XML is essential for enterprise systems, SOAP services, and applications using schema validation or XSLT transformation. YAML is the go-to for DevOps configuration — Kubernetes, Docker, CI/CD pipelines, and infrastructure-as-code tools.

What is the difference between JSON, XML, and YAML?

All three are hierarchical data formats, but they serve different ecosystems. JSON uses braces and brackets ({"key": "value"}), is compact and easy to parse in JavaScript, and dominates REST APIs. XML uses tags (<key>value</key>), supports schemas, namespaces, and XSLT, and dominates enterprise and document systems. YAML uses indentation (key: value), supports comments and anchors, and dominates DevOps configuration. CSV is flat (tabular) rather than hierarchical and is best for spreadsheet-style data.

Can I convert between these data formats?

Yes, with some caveats. JSON ↔ YAML conversion is straightforward since YAML is a superset of JSON — tools like yq and Python’s yaml and json libraries handle this easily. XML ↔ JSON conversion is possible but lossy — XML attributes, namespaces, and mixed content don’t map cleanly to JSON. CSV ↔ JSON/XML conversion works for flat data but requires defining a structure for the hierarchical output. Libraries like Python’s pandas, xmltodict, and csv modules make format conversion straightforward.

Do you have malformed files for error handling testing?

Yes — every format in our collection includes a deliberately malformed file with documented errors. The malformed CSV has inconsistent columns and unclosed quotes. The malformed XML has mismatched tags and unescaped characters. The malformed JSON has trailing commas, single quotes, and comments. The malformed YAML has tab indentation, duplicate keys, and the Norway boolean problem. Each file is designed to test your parser’s error handling, error messages, and recovery strategies.

Are these data files safe to download?

Yes, all sample data files on this site are clean, verified, and safe. They are plain text files containing only structured sample data — no macros, no scripts, no executable code, no external entity references (in XML), and no tracking. Every file is safe for use in any development, testing, or production environment.

Can I use these files for database testing?

Yes. The CSV large dataset (100,000 rows) is ideal for testing bulk import operations like PostgreSQL’s COPY, MySQL’s LOAD DATA INFILE, or SQLite’s .import. The JSON large dataset (10,000 records) works well for testing document database imports (MongoDB, CouchDB). The XML large dataset (5 MB) tests XML-based data ingestion pipelines. For SQL-specific testing, see our sample SQL files.