ArtScale: Autoregressive Super-Resolution for Art Paintings via Multi-Scale Vision-Language Guidance
Abstract
Heritage digitization can require extreme super-resolution for inspecting brushwork, craquelure, and pigment aging beyond native capture limits. Most SR models are trained for fixed scale factors and degrade when extrapolated, while training directly for extreme scales is expensive. We present ArtScale, a scale-space autoregressive framework that reaches large magnifications by chaining intermediate steps while reusing a frozen SR backbone. To limit semantic drift at high magnification, ArtScale adds multi-scale vision--language guidance: a VLM generates art-aware prompts conditioned on the current and previous scale states. We fine-tune the prompt extractor with GRPO-based preference alignment to reduce repetitive or generic prompts. Experiments improve 4× restoration and show more stable behavior under recursive zooming.
Related articles
Related articles are currently not available for this article.