Sample Data Files for Download

Download free sample data files for development, testing, and data processing. Our library covers four major data formats — CSV, XML, JSON, and YAML — with files ranging from basic structures to 100,000-row datasets, real-world API responses, Kubernetes manifests, and deliberately malformed files for error handling testing. Every file is free — no sign-up required.

Choose your data format

Sample XML Files

The standard markup language for structured data interchange, configuration, and enterprise systems. Our XML collection includes 11 files covering basic structures, deep nesting, namespace declarations, XSD schema validation, XSLT transformations, CDATA sections, processing instructions, Unicode content, and a 5 MB large dataset. Ideal for testing DOM/SAX/StAX parsers, XPath queries, XSLT processors, and schema validators.

Quick DownloadsSizeFeatures
Basic structure1 KBSimple nested elements
Namespaces1 KBMultiple xmlns declarations
XSLT pair (ZIP)3 KBXML + XSLT stylesheet
International characters3 KBCJK, Arabic, Cyrillic, emoji
Large dataset5 MBPerformance testing

View all 11 XML sample files →


Sample CSV Files

The universal format for tabular data exchange. Our CSV collection includes 10 files with datasets from 100 to 100,000 rows, covering comma-delimited, tab-delimited (TSV), and pipe-delimited variants. Includes mixed data types, quoted fields with embedded commas and newlines, Unicode characters, files with no header row, empty/null field patterns, and deliberately malformed data. Ideal for testing spreadsheet imports, database loading, ETL pipelines, and data analysis tools.

Quick DownloadsRowsDelimiter
Basic data set100Comma
Quoted fields200Comma
Tab-separated (TSV)200Tab
Empty fields150Comma
Large dataset100,000Comma

View all 10 CSV sample files →


Sample JSON Files

The dominant format for REST APIs, web applications, and configuration. Our JSON collection includes 11 files with simple objects, deeply nested structures, paginated API responses, error response objects, GeoJSON geospatial data, JSON Lines (JSONL) for streaming, a 10,000-record large dataset, and malformed JSON for error handling. Ideal for testing API clients, JSON parsers, config loaders, and data transformation pipelines.

Quick DownloadsSizeUse Case
Simple object1 KBBasic parsing
API response (paginated)5 KBREST API client testing
GeoJSON8 KBMap/geospatial testing
JSON Lines (JSONL)70 KBLog parsing, streaming
Large dataset3.7 MBPerformance testing

View all 11 JSON sample files →


Sample YAML Files

The standard configuration format for DevOps and infrastructure-as-code. Our YAML collection includes 10 files with real-world examples — Kubernetes Deployments, Docker Compose stacks, GitHub Actions CI/CD workflows, Ansible playbooks, OpenAPI/Swagger API specs, plus advanced features like anchors/aliases, multi-document files, and complex nested data. Ideal for testing YAML parsers, linters, config loaders, and infrastructure tools.

Quick DownloadsSizePlatform / Feature
Basic config2 KBApp settings
Kubernetes deployment3 KBContainer orchestration
Docker Compose3 KBMulti-container apps
GitHub Actions4 KBCI/CD pipeline
OpenAPI spec9 KBAPI documentation

View all 10 YAML sample files →


Data format comparison

Not sure which format to use for testing? This comparison covers the key differences:

FormatTypeStructureBest For
CSVTabularRows and columns, flat structureSpreadsheets, database import, data analysis, ETL
XMLHierarchical markupTags, attributes, namespacesEnterprise systems, SOAP APIs, config, document markup
JSONHierarchical dataKey-value pairs, arrays, objectsREST APIs, web apps, config, data exchange
YAMLHierarchical configIndentation-based, human-readableKubernetes, Docker, CI/CD, infrastructure-as-code

Which format should you download?

The right format depends on your testing scenario:

  • Testing a REST API client? Use JSON — our paginated API response and error response files simulate real REST endpoints.
  • Testing spreadsheet or database import? Use CSV — available in comma, tab, and pipe-delimited variants with up to 100,000 rows.
  • Testing XML parsers or XSLT? Use XML — includes namespaces, schemas, CDATA, XSLT pairs, and a 5 MB performance test file.
  • Testing DevOps config or infrastructure tools? Use YAML — real-world Kubernetes, Docker Compose, GitHub Actions, and Ansible examples.
  • Testing error handling? Every format includes a deliberately malformed file — malformed CSV, malformed XML, malformed JSON, and malformed YAML.
  • Testing Unicode and internationalization? Our CSV, XML, and JSON collections each include files with multi-language content (CJK, Arabic, Cyrillic, Hindi, and more).
  • Performance and stress testing? Download our large datasets — 100K-row CSV (10 MB), 5 MB XML, or 10K-record JSON (3.7 MB).

Sample files in other categories

Need files beyond data formats? Browse our other sample file categories:

FAQs about sample data files

Which data format is best for testing?

It depends on what you’re building. CSV is best for testing tabular data workflows — spreadsheet imports, database bulk loading, and data analysis tools. JSON is the standard for REST APIs, frontend data binding, and modern web applications. XML is essential for enterprise systems, SOAP services, and applications using schema validation or XSLT transformation. YAML is the go-to for DevOps configuration — Kubernetes, Docker, CI/CD pipelines, and infrastructure-as-code tools.

What is the difference between JSON, XML, and YAML?

All three are hierarchical data formats, but they serve different ecosystems. JSON uses braces and brackets ({"key": "value"}), is compact and easy to parse in JavaScript, and dominates REST APIs. XML uses tags (<key>value</key>), supports schemas, namespaces, and XSLT, and dominates enterprise and document systems. YAML uses indentation (key: value), supports comments and anchors, and dominates DevOps configuration. CSV is flat (tabular) rather than hierarchical and is best for spreadsheet-style data.

Can I convert between these data formats?

Yes, with some caveats. JSON ↔ YAML conversion is straightforward since YAML is a superset of JSON — tools like yq and Python’s yaml and json libraries handle this easily. XML ↔ JSON conversion is possible but lossy — XML attributes, namespaces, and mixed content don’t map cleanly to JSON. CSV ↔ JSON/XML conversion works for flat data but requires defining a structure for the hierarchical output. Libraries like Python’s pandas, xmltodict, and csv modules make format conversion straightforward.

Do you have malformed files for error handling testing?

Yes — every format in our collection includes a deliberately malformed file with documented errors. The malformed CSV has inconsistent columns and unclosed quotes. The malformed XML has mismatched tags and unescaped characters. The malformed JSON has trailing commas, single quotes, and comments. The malformed YAML has tab indentation, duplicate keys, and the Norway boolean problem. Each file is designed to test your parser’s error handling, error messages, and recovery strategies.

Are these data files safe to download?

Yes, all sample data files on this site are clean, verified, and safe. They are plain text files containing only structured sample data — no macros, no scripts, no executable code, no external entity references (in XML), and no tracking. Every file is safe for use in any development, testing, or production environment.

Can I use these files for database testing?

Yes. The CSV large dataset (100,000 rows) is ideal for testing bulk import operations like PostgreSQL’s COPY, MySQL’s LOAD DATA INFILE, or SQLite’s .import. The JSON large dataset (10,000 records) works well for testing document database imports (MongoDB, CouchDB). The XML large dataset (5 MB) tests XML-based data ingestion pipelines. For SQL-specific testing, see our sample SQL files.

Share: