Apple researchers find vision-language models leak task-irrelevant information through logits
A new study systematically examines what information can be extracted from model internals, revealing that even constrained representations like top-k logits retain sensitive data not visible in standard outputs.
1 source · cross-referenced
- Apple researchers published a study examining information leakage from vision-language model internals, comparing what data survives compression through different representational bottlenecks
- The work demonstrates that top-k logits—typically considered less informative than full residual stream projections—can still leak task-irrelevant information from image queries
- The research uses vision-language models as a testbed to systematically probe information retention across different model layers and compression methods
Apple Machine Learning Research has published a study examining how much information can be extracted from the internal representations of vision-language models, even when those representations appear constrained. The work, authored by Fedzechkina, Gualdoni, Ramos, and Williamson, systematically compares information retention across different compression levels within model architectures.
The researchers focused on two natural information bottlenecks: low-dimensional projections derived from the residual stream using tuned lens techniques, and the final top-k logits that typically influence model outputs. Their key finding is that top-k logits—simpler and more accessible than raw residual stream data—still retain task-irrelevant information from image-based queries, sometimes leaking as much sensitive data as direct projections of the full residual stream.
The study treats vision-language models as a testbed to understand the broader problem of unintentional or malicious information leakage. Model users and owners may assume that certain outputs or internal states are inaccessible or contain only task-relevant information, but this work suggests those assumptions merit scrutiny. Even bottlenecks designed to compress and filter information can become vectors for privacy violations.
- Apr 24, 2026 · arXiv cs.AI
New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks
Trust69 - Apr 24, 2026 · arXiv cs.AI
Researchers propose policy-grounded metrics to replace agreement-based evaluation in AI content moderation
Trust70 - Apr 24, 2026 · Google DeepMind — Blog
Google DeepMind proposes Decoupled DiLoCo for resilient distributed AI model training across data centers
Trust69