Skip to content
Research · Apr 24, 2026

Text embeddings replace domain knowledge in algorithm selection across seven problem classes

Researchers propose ZeroFolio, a feature-free method that uses pretrained text embeddings to select algorithms without manual feature engineering, outperforming hand-crafted approaches on 10 of 11 tested scenarios.

Trust79
HypeLow hype

1 source · single source

ShareXLinkedInEmail
TL;DR
  • ZeroFolio uses pretrained text embeddings instead of hand-crafted features to select algorithms across diverse problem domains including SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems.
  • The method outperformed random forest baselines trained on domain-specific features in 10 of 11 test scenarios with a single configuration, and all 11 scenarios with two-seed voting.
  • Key design choices include inverse-distance weighting, line shuffling, and Manhattan distance as identified through ablation study.
  • Combining embeddings with traditional hand-crafted features via soft voting yielded further improvements on competitive scenarios.

A research team led by Stefan Szeider has proposed ZeroFolio, a domain-agnostic approach to algorithm selection that eliminates the need for hand-engineered features. Rather than extracting problem-specific characteristics, the method treats raw instance files as plain text, encodes them with pretrained embeddings, and applies weighted k-nearest neighbors for solver selection.

The core innovation rests on an empirical observation: pretrained language model embeddings capture structural distinctions between problem instances without explicit domain knowledge or task-specific fine-tuning. This permits the same three-step pipeline—serialize, embed, select—to work across unrelated problem classes.

The authors evaluated ZeroFolio on 11 scenarios spanning seven distinct combinatorial optimization domains: satisfiability, maximum satisfiability, quantified Boolean formulas, answer set programming, constraint satisfaction, mixed-integer programming, and graph problems. Against random forest classifiers built on conventional hand-crafted features, ZeroFolio outperformed baselines in 10 of 11 scenarios using a single fixed hyperparameter set, and in all 11 scenarios when ensemble voting with two random seeds was applied.

Ablation analysis identified three critical design decisions: inverse-distance weighting for neighbor contribution, random line shuffling during text preprocessing, and Manhattan distance as the similarity metric. On datasets where both approaches showed comparable performance, combining embeddings with traditional features through soft voting produced measurable gains.

Sources
  1. 01arXiv cs.AIAlgorithm Selection with Zero Domain Knowledge via Text Embeddings
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.