EIPO and curiosity

Jun 3, 2026 · 2 min read · ai rl curiosity ·

This LinkedIn Post poking fun at the #GreatAIReplacement reminded me of this now ancient paper (2022) Redeeming Intrinsic Rewards via Constrained Optimization on mathematical approaches to governing the balance between exploration (curiosity) and exploitation (using known approaches). The paper covers their approach to optimizing model training to play various ATARI games. The Extrinsic-Intrinsic Policy Optimization (EIPO) algorithm they proposed did seem to improve over other reinforcement algorithms.

It’s interesting to think of curiosity as a mathematically definable property. I tend to view it in a slightly less quantifiable frame. This essay CurioCuriosity Is No Solo Actd has an interesting take that I like, proposing that curiosity is communal. There’s a quote in the piece: “‘the eternal convergence of the world within any one thing,’ writes Carl Mika, such that ‘one thing is never alone and all things actively construct and compose it.’ From this perspective of deep holism, talk of knowing any one thing is ‘minimally useful.’ As such, knowledge is not properly propositional but instead procedural; it is less concerned with knowing what than with knowing how. And its wisdom lies in ‘sharing’ more than ‘stating.’”

It’s difficult to mix the contemplative nature of this approach with the functional approach to optimized knowledge, but I keep coming back to the idea that doing great things requires community; it’s the culture. While operational efficiency is king, I’ll be curious to see how companies that align to this more humanistic approach to curiosity and the value it brings perform compared with those that choose functional optimization.