Legal AI Research Lab

Kaipsul

Kaipsul is an independent AI research lab. Our current project focuses on context engineering.

Important Notice: MIRC (Research Preview)

Memory-Isolated Recursive Compression ("MIRC") utilizes algorithmic segmentation and probabilistic compression. Output constitutes a lossy representation, not a verbatim reproduction. Users must independently verify all output against original source text. MIRC is provided strictly for research and evaluation purposes. Do not rely on compressed output for legal filings, evidentiary support, or factual verification.

MIRC

Memory-Isolated Recursive Compression for document pre-processing

What is MIRC?

MIRC is a document pre-processing tool that segments large texts into memory-isolated chunks, compresses them in parallel using on-device AI, and then reconstructs the compressed output.

It is designed for downstream AI processing. It aims to preserve semantic structure while reducing token count.

The Process

  1. 1.

    Chunking: Segment document into predefined memory chunks

  2. 2.

    Compression: Process each chunk independently using on-device AI

  3. 3.

    Reconstruction: Concatenate processed segments into a unified file

  4. 4.

    Downstream Integration: Output is formatted for LLM inference

Document Length and AI Performance

Why compression matters for downstream processing

Technical Challenge

As documents get longer, language models distribute attention across exponentially more tokens. This dilutes attention mechanisms, degrading retrieval accuracy and instruction following.

Many documents often exceed practical limits. Even when technically processable, performance degrades with length.

Compression as Pre-processing

MIRC increases signal density. By mathematically reducing token count while retaining semantic pointers, it allows downstream AI models to allocate attention resources more effectively.

Designed for AI Systems: Compressed output serves as an intermediary format. Large documents are compressed into a dense format, enabling inference by systems that would otherwise be constrained by context window limits.

Research Findings

Empirical results from MIRC implementation on benchmark documents

Supreme Court Opinion

SFFA v. Harvard

84.2% Compression
483K -> 76K Characters
162 Chunks

Federal Legislation

Consolidated Appropriations Act, 2018

83.7% Compression
848K -> 138K Characters
284 Chunks

Federal Legislation

One Big Beautiful Bill Act (2025)

84.7% Compression
330K -> 51K Characters
111 Chunks

Special Counsel Report

Mueller Report Volume II

86.5% Compression
622K -> 84K Characters
208 Chunks

Supreme Court Opinion

Dobbs v. Jackson

85.6% Compression
429K -> 62K Characters
144 Chunks

Implementation

Reference implementation for macOS

MIT License - Open Source

Research Preview (v0.1.0)

This is a Swift implementation using Apple's Foundation Models framework. It utilizes on-device processing with parallel compression via actor-based concurrency.

Requirements: Apple Silicon (M1+) - macOS 26.0+ - Apple Intelligence