Research · Apr 22, 2026

Apple researchers find vision-language models leak task-irrelevant information through logits

A new study systematically examines what information can be extracted from model internals, revealing that even constrained representations like top-k logits retain sensitive data not visible in standard outputs.

Trust62

HypeSome hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Apple researchers published a study examining information leakage from vision-language model internals, comparing what data survives compression through different representational bottlenecks
The work demonstrates that top-k logits—typically considered less informative than full residual stream projections—can still leak task-irrelevant information from image queries
The research uses vision-language models as a testbed to systematically probe information retention across different model layers and compression methods

Apple Machine Learning Research has published a study examining how much information can be extracted from the internal representations of vision-language models, even when those representations appear constrained. The work, authored by Fedzechkina, Gualdoni, Ramos, and Williamson, systematically compares information retention across different compression levels within model architectures.

The researchers focused on two natural information bottlenecks: low-dimensional projections derived from the residual stream using tuned lens techniques, and the final top-k logits that typically influence model outputs. Their key finding is that top-k logits—simpler and more accessible than raw residual stream data—still retain task-irrelevant information from image-based queries, sometimes leaking as much sensitive data as direct projections of the full residual stream.

The study treats vision-language models as a testbed to understand the broader problem of unintentional or malicious information leakage. Model users and owners may assume that certain outputs or internal states are inaccessible or contain only task-relevant information, but this work suggests those assumptions merit scrutiny. Even bottlenecks designed to compress and filter information can become vectors for privacy violations.

Sources

01Apple — Machine Learning Research — What Do Your Logits Know?

Also on Research

Apple researchers find vision-language models leak task-irrelevant information through logits

New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks

Researchers propose policy-grounded metrics to replace agreement-based evaluation in AI content moderation

Google DeepMind proposes Decoupled DiLoCo for resilient distributed AI model training across data centers