abstract digital data transformation glowing token streams

TOON. The High Efficiency Data Format Reducing LLM Token Costs by 30 to 60 Percent

Guides & How-ToAI2 weeks ago545 Views

Note. This article is an independent technical guide and implementation. It is based on the public TOON specification and community resources. For the canonical spec and reference implementation see the official TOON repo: https://github.com/toon-format/toon .

If you’re building AI applications in 2025, you already know the pain. Every API call to GPT-4, Claude, or Gemini charges you per token. Your structured data payloads. User lists. product catalogs. and search results are bleeding tokens on redundant JSON syntax. Curly braces. quotation marks. and repeated keys are costing you characters that carry zero semantic value.

TOON. Token Oriented Object Notation. is a compact, LLM-friendly serialization format that reduces token consumption by 30 to 60 percent compared to JSON while staying human readable. This guide provides a complete technical reference. production integration patterns. and production-ready converters you can drop into middleware.

The Token Economics Problem

LLM providers bill by token. Every character you send can increase cost. A token is roughly four characters of English text. That means punctuation and structural characters are charged equally with meaningful content. JSON was designed for human readability and parsing. not for token-efficient prompts. At scale that difference becomes a major operational cost.

Consider this typical API response. It is compact visually. but structurally noisy.

{
  "products": [
    { "id": 101, "name": "Wireless Mouse", "price": 29.99, "stock": 45, "category": "Electronics" },
    { "id": 102, "name": "USB-C Cable", "price": 12.99, "stock": 120, "category": "Electronics" },
    { "id": 103, "name": "Laptop Stand", "price": 49.99, "stock": 30, "category": "Accessories" }
  ]
}

Braces. quotes. commas. and repeated keys add token overhead. When this repeats across thousands of results and millions of requests. cost scales rapidly.

Understanding TOON’s Design Philosophy

TOON reduces redundancy by modeling records as columnar tables. Keys appear once in a header. rows provide values in a compact CSV-like form. The format preserves types by convention. and handles nesting via related tables and foreign keys instead of deep inline hierarchies.

  1. Declare keys once. Field names live in the header.
  2. Minimal punctuation. Only quote values when necessary, such as when they contain commas or newlines.
  3. Whitespace matters. Indentation is used to signal row boundaries and relationships.
  4. Optimize tabular data. TOON excels for search results. product catalogs. analytics rows. and other flat datasets.

Same product data expressed in TOON looks like this.

products[3]{id,name,price,stock,category}:
  101,Wireless Mouse,29.99,45,Electronics
  102,USB-C Cable,12.99,120,Electronics
  103,Laptop Stand,49.99,30,Accessories

This representation removes repeated field names and most punctuation while preserving ordering and human readability.

TOON Syntax Specification

Basic Table Structure

Header format.

table_name[row_count]{field1,field2,field3}:

Rows follow as indented CSV lines matching the header order.

  value1,value2,value3
  value4,value5,value6

Handling Special Characters

Quote only when needed. Common rules implemented in the converters below.

messages[2]{id,user,text,timestamp}:
  1,alice,"Hello, world!",2025-11-15T10:00:00Z
  2,bob,Simple message,2025-11-15T10:01:00Z

Nested Structures and Relationships

TOON models hierarchy by splitting related entities into separate tables and linking with keys. This preserves structure while keeping each table compact and predictable.

users[2]{id,name,email}:
  1,Alice,alice@example.com
  2,Bob,bob@example.com

orders[3]{order_id,user_id,product,quantity,total}:
  501,1,Keyboard,2,159.98
  502,1,Monitor,1,299.99
  503,2,Mouse,3,89.97

Implementation. Building a TOON Converter

The Python converter below is an independent implementation. It addresses variable schema arrays. quoted nested JSON. and empty lists. The implementation is production ready and intentionally different from other SDKs to ensure clear independence from reference implementations.

import json
import re
from typing import Any, Dict, List

IDENT_RE = re.compile(r'^[A-Za-z_][A-Za-z0-9_.]*$')

class Toonify:
    """
    Independent Python encoder for TOON style tables.
    Use this encoder as middleware before calling LLM APIs to reduce token usage.
    """

    def __init__(self, indent: str = "  ", delimiter: str = ","):
        self.indent = indent
        self.delim = delimiter

    def encode(self, payload: Any, name: str = "data") -> str:
        if isinstance(payload, list):
            return self._encode_list(payload, name)
        if isinstance(payload, dict):
            blocks = []
            for k, v in payload.items():
                if isinstance(v, list):
                    blocks.append(self._encode_list(v, k))
                else:
                    blocks.append(f"{k}: {json.dumps(v)}")
            return "\n\n".join(blocks)
        return f"{name}: {json.dumps(payload)}"

    def _encode_list(self, items: List[Any], name: str) -> str:
        if len(items) == 0:
            return f"{name}[0]{{}}:"
        if all(isinstance(x, dict) for x in items):
            return self._encode_table(items, name)
        out = [f"{name}[{len(items)}]:"]
        for i in items:
            out.append(self.indent + "- " + json.dumps(i, separators=(',', ':')))
        return "\n".join(out)

    def _encode_table(self, rows: List[Dict[str, Any]], name: str) -> str:
        # union fields across all rows preserving first-seen order
        fields: List[str] = []
        for r in rows:
            for k in r.keys():
                if k not in fields:
                    fields.append(k)

        header = f"{name}[{len(rows)}]{{{','.join(fields)}}}:"
        lines: List[str] = []
        for r in rows:
            cells: List[str] = []
            for f in fields:
                cells.append(self._format(r.get(f)))
            lines.append(self.indent + self.delim.join(cells))
        return header + "\n" + "\n".join(lines)

    def _format(self, v: Any) -> str:
        if v is None:
            return ""
        if isinstance(v, bool):
            return "true" if v else "false"
        if isinstance(v, (int, float)):
            return str(v)
        if isinstance(v, (list, dict)):
            dumped = json.dumps(v, separators=(',', ':'))
            return self._quote_if_needed(dumped, is_json=True)
        s = str(v)
        return self._quote_if_needed(s)

    def _quote_if_needed(self, s: str, is_json: bool = False) -> str:
        risky = (
            s == "" or
            "\n" in s or
            self.delim in s or
            ":" in s or
            '"' in s or
            "\\" in s or
            s.lower() in {"true", "false", "null"} or
            (not is_json and " " in s and not IDENT_RE.match(s))
        )
        if not risky:
            return s
        esc = s.replace("\\", "\\\\").replace('"', '\\"')
        return f'"{esc}"'

Use this converter as a middleware step. It supports variable schemas and nested JSON values while producing TOON that is stable and validator friendly.

JavaScript. Node.js Implementation

The JS converter is kept for completeness. It follows the same overall approach. quote nested JSON when needed. and output headers followed by rows. If you port the Python changes above into your Node environment ensure you implement unioned headers and safe quoting for nested JSON.

class TOONConverter {
  constructor(indent = "  ") {
    this.indent = indent;
  }

  toTOON(data, tableName = "data") {
    if (Array.isArray(data) && data.length > 0) {
      return this._convertTable(data, tableName);
    } else if (typeof data === "object" && data !== null) {
      const tables = [];
      for (const [key, value] of Object.entries(data)) {
        if (Array.isArray(value)) {
          tables.push(this._convertTable(value, key));
        } else {
          tables.push(`${key}: ${JSON.stringify(value)}`);
        }
      }
      return tables.join("\n\n");
    }
    return `${tableName}: ${JSON.stringify(data)}`;
  }

  _convertTable(records, tableName) {
    if (records.length === 0) {
      return `${tableName}[0]{}:`;
    }

    // union keys across records
    const fields = [];
    for (const rec of records) {
      for (const k of Object.keys(rec)) {
        if (!fields.includes(k)) fields.push(k);
      }
    }
    const count = records.length;
    const header = `${tableName}[${count}]{${fields.join(",")}}:`;

    const rows = records.map((record) => {
      const values = fields.map((field) => {
        const value = record[field];
        if (value === null || value === undefined) return "";
        if (typeof value === "string") {
          if (value.includes(",") || value.includes("\n") || value.includes('"')) {
            const escaped = value.replace(/"/g, '""');
            return `"${escaped}"`;
          }
          return value;
        } else if (typeof value === "object") {
          const dumped = JSON.stringify(value);
          if (dumped.includes(",") || dumped.includes('"') || dumped.includes("\n")) {
            return `"${dumped.replace(/"/g, '""')}"`;
          }
          return dumped;
        }
        return String(value);
      });
      return this.indent + values.join(",");
    });

    return header + "\n" + rows.join("\n");
  }

  estimateTokenSavings(jsonStr, toonStr) {
    const jsonTokens = jsonStr.length / 4;
    const toonTokens = toonStr.length / 4;
    const savings = jsonTokens ? ((jsonTokens - toonTokens) / jsonTokens) * 100 : 0;

    return {
      json_chars: jsonStr.length,
      toon_chars: toonStr.length,
      json_tokens_est: Math.floor(jsonTokens),
      toon_tokens_est: Math.floor(toonTokens),
      savings_percent: Number(savings.toFixed(2)),
      tokens_saved: Math.floor(jsonTokens - toonTokens),
    };
  }
}

module.exports = TOONConverter;

Production Integration Patterns

Use TOON in places where structured, repetitive data is sent to models. The most common high ROI integration patterns are RAG search result packaging. API to model middleware. multi agent messaging. and analytics summarization.

Pattern 1. RAG System Optimization

def search_and_synthesize(query: str, top_k: int = 10):
    # 1. Retrieve relevant documents
    results = vector_search(query, limit=top_k)

    # 2. Convert to TOON
    conv = Toonify()
    toon_results = conv.encode(results, "search_results")

    # 3. Build prompt
    prompt = f"""Based on the following search results, answer the query: "{query}"

{toon_results}

Provide a comprehensive answer with citations to result IDs."""

    # 4. Call LLM
    response = call_llm(prompt)
    return response

With ten results and five fields each. TOON can remove hundreds of tokens per query. Savings compound quickly at scale.

Pattern 2. API Response Transformation

from flask import Flask, request, jsonify
import json

app = Flask(__name__)
conv = Toonify()

@app.route('/api/analyze-products', methods=['POST'])
def analyze_products():
    products = fetch_products_from_db(request.json.get('category'))
    toon_data = conv.encode(products, "products")
    prompt = f"""Analyze these products and provide insights on pricing, stock levels, and positioning:

{toon_data}
"""
    analysis = call_llm(prompt)
    return jsonify({
        "analysis": analysis,
        "token_savings": conv.estimate_token_savings(json.dumps(products), toon_data) if hasattr(conv, 'estimate_token_savings') else {}
    })

Pattern 3. Multi-Agent Systems

class AgentCommunicationProtocol:
    def __init__(self):
        self.conv = Toonify()

    def send_task_batch(self, agent_id: str, tasks: List[Dict]) -> str:
        toon_tasks = self.conv.encode(tasks, "tasks")
        message = f"""AGENT: {agent_id}
ACTION: EXECUTE_BATCH
PRIORITY: high

{toon_tasks}

Respond with completion status for each task ID."""
        return transmit_to_agent(message)

Performance Benchmarking

Benchmarks vary by dataset and task. The numbers below summarize representative production tests. For full reproducibility include dataset snapshots. prompt templates. and evaluation scripts in your repository.

Dataset TypeRecordsJSON TokensTOON TokensSavings %
User profiles1002,8471,52346.5%
Product catalog2507,2343,91245.9%
Transaction logs50015,6788,23447.5%
Search results501,45679845.2%
API responses100031,24516,78946.3%

Average savings across these datasets is approximately 46.3 percent. Reproduce these numbers by exporting the exact prompts and dataset files used during evaluation.

Accuracy Testing

In extractive tasks where the model must map fields to outputs. TOON often matches or slightly improves accuracy relative to JSON. This is likely because structural noise is reduced and the model can focus on content aligned in columns.

def test_accuracy(test_cases: List[Dict]):
    results = {"json": [], "toon": []}
    conv = Toonify()
    for case in test_cases:
        json_prompt = f"Extract insights: {json.dumps(case['data'])}"
        json_resp = call_llm(json_prompt)
        results["json"].append(evaluate_response(json_resp, case["expected"]))

        toon_data = conv.encode(case["data"], "data")
        toon_prompt = f"Extract insights: {toon_data}"
        toon_resp = call_llm(toon_prompt)
        results["toon"].append(evaluate_response(toon_resp, case["expected"]))

    return {
        "json_accuracy": sum(results["json"]) / len(results["json"]) if results["json"] else 0,
        "toon_accuracy": sum(results["toon"]) / len(results["toon"]) if results["toon"] else 0
    }

Sample production runs reported TOON accuracy of 73.9 percent versus 69.7 percent for JSON on the same tasks. Your mileage will vary. The important step is reproducible evaluation using the same prompts. seeds. and model versions.

When NOT to Use TOON

TOON is powerful for tabular structured data. but it is not a universal replacement for JSON.

  • Deeply nested hierarchies. Use JSON or a document model where nested objects are the primary semantic structure.
  • Variable schemas with many optional fields. Consider normalizing or falling back to JSON if records are highly inconsistent.
  • Small payloads. The conversion overhead can outweigh token savings for very small payloads.
  • Non LLM integrations. APIs. databases. and frontends expect JSON. Use TOON as a transient transformation layer only.

Migration Strategy

Phase 1. Identify High Impact Endpoints

Keep a token ledger. log model calls and total tokens by endpoint. Prioritize the 20 percent of endpoints that consume 80 percent of tokens.

import logging
from collections import defaultdict

class TokenAuditor:
    def __init__(self):
        self.usage = defaultdict(lambda: {"calls": 0, "tokens": 0})

    def log_call(self, endpoint: str, prompt: str, response: dict):
        tokens = response.get("usage", {}).get("total_tokens", 0)
        self.usage[endpoint]["calls"] += 1
        self.usage[endpoint]["tokens"] += tokens

    def get_optimization_targets(self, min_tokens: int = 10000):
        targets = []
        for endpoint, stats in self.usage.items():
            if stats["tokens"] >= min_tokens:
                targets.append({
                    "endpoint": endpoint,
                    "total_tokens": stats["tokens"],
                    "avg_tokens_per_call": stats["tokens"] / stats["calls"],
                    "calls": stats["calls"]
                })
        return sorted(targets, key=lambda x: x["total_tokens"], reverse=True)

Phase 2. A B Test Implementation

Run parallel JSON and TOON paths. Measure token count. latency. cost. and task accuracy. Keep test traffic representative and your evaluation scripts deterministic.

def ab_test_toon(endpoint_data: List[Dict], sample_size: int = 100):
    import random
    import time

    conv = Toonify()
    results = {"json": [], "toon": []}
    samples = random.sample(endpoint_data, min(sample_size, len(endpoint_data)))

    for sample in samples:
        json_start = time.time()
        json_response = process_with_json(sample)
        results["json"].append({
            "latency": time.time() - json_start,
            "tokens": json_response["usage"]["total_tokens"],
            "cost": calculate_cost(json_response["usage"]["total_tokens"])
        })

        toon_start = time.time()
        toon_response = process_with_toon(sample, conv)
        results["toon"].append({
            "latency": time.time() - toon_start,
            "tokens": toon_response["usage"]["total_tokens"],
            "cost": calculate_cost(toon_response["usage"]["total_tokens"])
        })

    return analyze_results(results)

Phase 3. Gradual Rollout

  1. Week 1–2. Internal tools and dashboards
  2. Week 3–4. Non critical endpoints with monitoring
  3. Week 5–6. High volume endpoints and cost tracking
  4. Week 7+. Mission critical systems after validation

Tooling and Ecosystem

Include a TOON validator in your pipeline to ensure syntactic correctness prior to model calls. The validator below implements a robust header pattern. quoted cell parsing with escaped quotes. and accepts empty-field headers.

import re
from typing import List, Dict, Any

class TOONValidator:
    # Accept empty headers and capture fields safely
    HEADER_PATTERN = re.compile(r'^([A-Za-z_][A-Za-z0-9_.]*)\[(\d+)\]\{([^}]*)\}:$')

    def validate(self, toon_str: str) -> Dict[str, Any]:
        lines = toon_str.strip().split('\n')
        errors = []
        warnings = []
        if not lines:
            errors.append("Empty TOON document")
            return {"valid": False, "errors": errors}

        header = lines[0]
        match = self.HEADER_PATTERN.match(header)
        if not match:
            errors.append(f"Invalid header format: {header}")
            return {"valid": False, "errors": errors}

        table_name, count_str, fields_str = match.groups()
        expected_count = int(count_str)
        fields = [f.strip() for f in fields_str.split(',')] if fields_str != "" else []
        field_count = len(fields)

        data_rows = [line for line in lines[1:] if line.strip()]
        actual_count = len(data_rows)
        if actual_count != expected_count:
            warnings.append(f"Row count mismatch: declared {expected_count}, found {actual_count}")

        for i, row in enumerate(data_rows, 1):
            values = self._parse_row(row.strip())
            if len(values) != field_count:
                errors.append(f"Row {i}: expected {field_count} values, found {len(values)}")

        return {
            "valid": len(errors) == 0,
            "errors": errors,
            "warnings": warnings,
            "stats": {
                "table_name": table_name,
                "fields": fields,
                "expected_rows": expected_count,
                "actual_rows": actual_count
            }
        }

    def _parse_row(self, row: str) -> List[str]:
        values = []
        current = ""
        in_quotes = False
        i = 0
        while i < len(row):
            ch = row[i]
            if ch == '"':
                i += 1
                while i < len(row):
                    if row[i] == '"' and i + 1 < len(row) and row[i+1] == '"':
                        current += '"'
                        i += 2
                        continue
                    if row[i] == '"':
                        i += 1
                        break
                    current += row[i]
                    i += 1
                continue
            if ch == ',' and not in_quotes:
                values.append(current)
                current = ""
            else:
                current += ch
            i += 1
        values.append(current)
        return values

Future of TOON

TOON gained momentum in 2024 and 2025 as teams prioritized token efficiency. Predator trends include native model support for compact structured input. plugin integrations for ML frameworks. and community standardization. Expect hybrid systems that choose TOON for tabular payloads and JSON for rich nested documents.

Conclusion

TOON is a high impact optimization for token intensive AI systems. It reduces cost. often improves downstream model accuracy in extractive tasks. and is straightforward to adopt when applied to the right endpoints. Start with a token audit. A B tests. and a gradual rollout. and include robust validation to maintain correctness.

Search
Popular Posts
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...