Most compression systems trade fidelity for size. Datasent refuses that bargain. Here is how exact reconstruction is guaranteed, mathematically, for every data type.

Lossless compression has existed for decades. ZIP files,PNG images, FLAC audio — they all promise to give you back exactly what you putin. So what is different about Datasent?
The difference is architectural. Classical lossless codecs— Huffman, LZ77, arithmetic coding — work by finding statistical redundancy in a raw byte stream. They produce an opaque compressed blob. The raw data isstill transmitted, just in a smaller envelope. The receiver decompresses and gets the original back.
Datasent does not compress a byte stream. It fits a structured mathematical model to the data, stores the model parameters and the residual — the difference between model and reality — and transmits only those. The raw data never moves. The receiver regenerates the predicted componentlocally and adds the residual to recover the original.

"The key result: reconstruction is exact byconstruction, not by approximation. If B⁻¹ is deterministic and the residual isstored exactly, the decoded output is identical to the input for every segment,every time"
Every segment of a dataset becomes a token. That token contains four elements: a model family identifier, a coefficient matrix capturing the fitted structure, an exact in teger residual capturing everythingthe model missed, and metadata sufficient to regenerate the basis deterministically.
The coefficient matrix captures local trend, curvature,and spectral characteristics — the kind of features that data preprocess in gpipelines normally spend compute deriving from raw inputs. The residualcaptures local deviations. Together, they contain all the information needed for perfect reconstruction.
Because the basis is deterministic — the same metadata always generates the same basis, on any machine, in any environment — the receiver does not need to receive the basis itself. They only need the metadata to regenerate it locally. This is what makes raw-data-local transmissionpossible.
Real datasets are structurally heterogeneous. Atime-series might follow a smooth polynomial trend for most of its length, then switch to high-frequency oscillation, then go flat. A single model family cannot capture all of this efficiently.
Datasent solves this with a dynamic programming algorithm that jointly optimises segment boundaries and model family assignments acrossthe full dataset, minimising total description length. The supported model families span polynomial bases, Fourier and cosine transforms, wave letdecompositions, data-driven PCA bases, and autoregressive models — with a rawidentity baseline that ensures the encoding can never be larger than theoriginal data.
No configuration is required. The MDL objective selects the right model automatically, per segment, based on what the data actually contains.

The lossless guarantee applies uniformly across every data type that Datasent handles. Tabular data, time-series, sensor streams, audio,images, video, text, embeddings, and graph structures all pass through the same canonicalisation pipeline and emerge as structured token streams. Each token stream satisfies the exact reconstruction theorem independently, and their composition is lossless for the full multimodal dataset.
This universality is not an accident. It is a consequence of the composition principle: every modality decomposes into integer matrices,and every integer matrix is tokenised using the same primitive. There is nospecial-case code for audio versus tabular data, no separate lossy path forimages. The same mathematical guarantee holds end to end.
Most systems force a trade-off between efficiency and accuracy. Compression reduces size but sacrifices fidelity. Secure transfer protects data, but only after it has already been exposed.
Datasent removes this compromise entirely.
By encoding data against a shared model basis and transmitting only what cannot be predicted, it ensures that nothing essential is lost — and nothing sensitive is exposed. The original data remains at its source, yet can be reconstructed exactly when authorised, with no approximation or degradation.

John Rhye
Position