Apple presents parallel RNN training, improved state space models, and unified vision models at ICLR 2026
The company is showcasing five research contributions at the conference in Rio de Janeiro, including a framework for parallelized RNN training that achieves 665× speedup and enables competitive 7-billion-parameter language models.
1 source · cross-referenced
- Apple will present research at ICLR 2026 on parallel RNN training (ParaRNN), improved state space models with tool-use access, unified image understanding and generation (Manzano), 3D scene generation from photos, and protein folding approaches.
- ParaRNN paper, accepted as oral at ICLR, describes a framework achieving 665× speedup over sequential RNN training and enabling 7-billion-parameter classical RNNs with language modeling performance competitive with transformers.
- State space model research shows SSMs fail on complex long-form generation due to bounded memory but can achieve length generalization when given external tool access for arithmetic, reasoning, and coding tasks.
- Manzano, a unified multimodal model, uses a hybrid vision tokenizer with separate adapters for image understanding and generation to reduce performance trade-offs between the two capabilities.
Apple researchers are presenting five major research contributions at the Fourteenth International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro this week, advancing work in efficient sequence modeling, multimodal vision-language systems, and 3D generation.
The centerpiece of Apple's presentation is ParaRNN, a framework for parallelizing training of recurrent neural networks. Historically, RNNs have been efficient for inference but difficult to scale due to the sequential nature of their computation. Apple's new approach achieves a 665× speedup over traditional sequential RNN training, making it feasible to train classical RNNs with up to 7 billion parameters. Benchmarking shows these large-scale RNNs achieve language modeling performance competitive with transformers of comparable size, potentially opening new architectural choices for practitioners building models under computational constraints. The ParaRNN codebase has been released as open source.
In parallel work on state space models (SSMs), Apple researchers identify and address a fundamental limitation: SSMs excel at long-context inference due to fixed-size memory and linear computational scaling, but this same constraint prevents them from solving complex problems that exceed the model's capacity, even with chain-of-thought generation. The research demonstrates that providing SSMs with external tool access—such as memory tools or code execution—enables them to generalize to arbitrary problem length and complexity on arithmetic, reasoning, and coding tasks.
Apple's Manzano model tackles a design trade-off in unified vision-language systems. Many existing multimodal models must choose between strong image understanding or strong generation. Manzano uses a hybrid tokenizer with separate lightweight adapters that feed a shared semantic space: one adapter produces continuous embeddings for understanding, the other produces discrete tokens for generation. A unified autoregressive language model predicts both text and image tokens, with an auxiliary diffusion decoder converting image tokens to pixels. This design achieves state-of-the-art results among unified models while remaining competitive with specialist systems, particularly on text-heavy evaluation.
Apple is also demonstrating local LLM inference on Apple silicon using the MLX framework and techniques for fast 3D scene synthesis from single images. The company is sponsoring affinity group events supporting underrepresented groups in machine learning research.
- Apr 24, 2026 · arXiv cs.AI
New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks
Trust69 - Apr 24, 2026 · arXiv cs.AI
Researchers propose policy-grounded metrics to replace agreement-based evaluation in AI content moderation
Trust70 - Apr 24, 2026 · Google DeepMind — Blog
Google DeepMind proposes Decoupled DiLoCo for resilient distributed AI model training across data centers
Trust69