AIxDeFi: When the Agent Sets the Lending Rate

From static rules to AI-based interest-rate policies: how reinforcement learning can make DeFi lending protocols smarter, safer, and more adaptive.

Jul 02, 2025

I recently had the opportunity to present our latest research paper at the MARBLE 2025 conference, where it was nominated for the Best Paper Award. This work is based on the Master’s thesis of Hanxiao, a talented student in the Banking and Finance program at the University of Zurich, whom I had the pleasure of advising.

The thesis explores the intersection of AI and decentralized finance (DeFi)—a space that’s growing rapidly but remains underexplored from a machine learning perspective. In this blog series, I’ll dive into concrete applications of AI in DeFi that I and my students have worked on over the last few years.

Upcoming Posts in the Series

Part I: Reinforcement Learning for Interest Rate Optimization in DeFi Lending
Part II: MEV Detection Using Graph-Based Representation Learning
Part III: Early Detection of Pump-and-Dump Schemes with LLMs
Part IV: AI Agents in Blockchain Gaming and Play-to-Earn Economies
Part V: AI-Powered Arbitrage and Perpetual Trading Agents
Part VI: Designing MEV Bots with AI Agents
Aave, the major DeFi lending protocol in DeFi.

Introduction: Why AI in DeFi Lending?

Decentralized Finance (DeFi) refers broadly to financial services built on blockchain technology. These services range from trading and lending to asset management and beyond. Unlike traditional finance (TradFi), DeFi eliminates the need for financial intermediaries. Instead, all services are executed automatically by computer programs known as smart contracts, which run transparently on the blockchain.

DeFi lending protocols like Aave, Compound, and Morpho account for 30–40% of the total value locked (TVL) in the DeFi ecosystem. They enable collateralized lending: for example, a user can deposit ETH tokens as collateral and borrow USDC tokens against it.

Unlike TradFi, all lending logic—including interest rate models, risk parameters, and liquidation thresholds—is coded directly into smart contracts. These contracts operate autonomously and transparently, acting as lenders and risk managers. Liquidity for lending comes from LPs (liquidity providers) who deposit tokens into a liquidity pool and earn rewards in return.

In DeFi lending protocols, a smart contract is a lender and risk manager that decides on the loan conditions. Liquidity Providers (LP) provide tokens to the liquidity pool to earn rewards from lending.

While this rule-based infrastructure is elegant and transparent, it struggles under stress conditions like market crashes or stablecoin depegs. The interest rate logic in most protocols today is static and hardcoded, making it slow to react to volatile market dynamics.

This is where AI—and specifically Reinforcement Learning (RL)—can help.

Background

What is Machine Learning (ML)?

Machine learning is a field of artificial intelligence that involves training algorithms to learn from data and make predictions or decisions without being explicitly programmed.

There are three main types of machine learning:

Supervised Learning: The model learns from labeled data.
- Example in DeFi: Predicting whether a user will get liquidated based on historical LTV (loan-to-value), volatility, and collateral data.
- Example in MEV: Classifying transactions as MEV-extractive or benign based on labeled past events.
Unsupervised Learning: The model identifies patterns or clusters in data without labels.
- Example in DeFi: Clustering wallets by borrowing behavior to detect Sybil attacks.
- Example in MEV: Grouping similar transaction patterns to discover unknown arbitrage strategies.
Reinforcement Learning (RL): The model (agent) learns by interacting with an environment and receiving rewards or penalties based on its actions.
- Example in DeFi: Adjusting interest rates to maximize capital efficiency and minimize risk.
- Example in MEV: Training a bot to optimize sandwich attack strategies based on block conditions.

What is Reinforcement Learning (RL)?

In RL, an agent interacts with an environment in a sequence of time steps. At each step, the agent:

Observes the current state of the environment
Takes an action
Receives a reward based on how beneficial the action was
Moves to the next state

The goal is to learn a policy (a strategy) that maximizes the expected cumulative reward over time.

Interest rates in DeFi Lending

Screenshot 2025-06-30 at 16.23.38.png — Kinked Interest rate model at Aave v3, USDC pool on Arbitrum, 30th June 2025. The current utilization rate is 83.63% and the optimal utilisation is 90%

In DeFi lending protocols, interest rates are typically determined by the utilization rate — the ratio of borrowed tokens to the total supplied tokens in a lending pool.

This is known as the utilization rate model, and it works as follows:

As utilization (U) increases — meaning more of the available liquidity is being borrowed — the interest rate also increases to reflect rising demand and risk.
Protocols often implement a kinked interest rate curve:
- Up to a certain optimal utilization threshold (U*), the interest rate increases gradually (90% in the picture above).
- Beyond U*, the rate rises steeply to incentivize liquidity provision and discourage excessive borrowing, helping maintain protocol stability.

This rule-based model is simple and transparent but often reacts too slowly to fast-changing market conditions — especially during stress events. That's where adaptive models, like reinforcement learning, come into play.

Methodology: Applying RL to DeFi Lending

Let’s now explore how RL can be applied to set the interest rate in DeFi lending protocols.

Defining the State: The state captures key market conditions from Aave lending pools. Each state snapshot at time t includes:

Liquidity Metrics:
- Available liquidity
- Total liquidity
- Utilization rate (borrowed / supplied)
Debt Metrics (Dt):
- Total variable debt
- Deposit volume
- Borrow volume
Interest Rate Indicators (It):
- Borrow and deposit rates
- Rate momentum and volatility
Risk Indicators (Rt):
- Loan-to-value ratio
- Volatility of liquidity and rates

Example: At time t, the pool has 75% utilization, increasing volatility, and shrinking available liquidity. These state features suggest rising borrower pressure.

Action Space: The agent adjusts:

Δrt: Change in liquidity rate (for depositors)
Δbt: Change in variable borrow rate (for borrowers)

Actions are expressed as relative changes from previous values, enabling stable and granular updates.

Reward Function: We define a composite reward function to balance capital efficiency, protocol stability, and user experience:

1. Utilization Penalty
Encourages rates that keep utilization near an optimal target:

2. Borrower-Lender Trade-off
Balances affordable borrowing with attractive LP returns:

3. Rate Stability Penalty
Discourages large rate swings that destabilize behavior:

The reward function of RL agent. Source: “From Rules to Rewards: Reinforcement Learning for Interest Rate Adjustment in DeFi Lending“, available at https://arxiv.org/pdf/2506.00505

Approach

We evaluated three RL methods for optimizing interest rates:

CQL (Conservative Q-Learning)

A risk-averse method that avoids actions not seen in the historical dataset. Useful for cautious deployment but tends to underperform in volatile or rapidly changing environments.

BC (Behavior Cloning)

A supervised learning baseline that directly mimics the historical interest rate actions of Aave. Stable but lacks adaptability or optimization capability.

TD3-BC (Twin Delayed Deep Deterministic Policy Gradient + Behavior Cloning)

A hybrid combining the stability of behavior cloning with the optimization power of deep RL. It learns from past data but also improves policy decisions through exploration and reward optimization. This model performed best in our evaluation.

Data Collection and Pre-processing

We used on-chain data from AaveScan on Ethereum, covering WETH and WBTC Aave pools from March 2021 to February 2025. Key variables such as utilization rate, APY, and LTV were derived and normalized. We also calculated rolling volatilities and momentum indicators to capture market dynamics. All data was aggregated at daily frequency to support stable and efficient model training.

Key Findings

Interest Rate Responsiveness: TD3-BC produces smoother and more frequent adjustments, allowing the protocol to respond to market changes proactively rather than reactively.
Bad Debt Mitigation: The RL agent raises rates earlier under stress, discouraging risky borrowing and helping to avoid undercollateralized positions.
Improved LP Yields: Median liquidity rates were higher and more consistent with TD3-BC, especially in thinly traded markets like WBTC.
Stress Scenario Robustness: TD3-BC responded more effectively to major DeFi shocks such as:
- The USDC depeg (March 2023)
- The FTX collapse (November 2022)
- The ETH crash (August 2024)

Despite never being explicitly trained on these events, the model exhibited stress-aware behavior.

Why It Matters

This study presents the first empirical application of offline reinforcement learning to interest rate policy in DeFi, trained entirely on historical Aave market data. The results show that:

Protocols can become more adaptive and efficient
RL agents can act as automated rate governors
This approach offers a credible path to dynamic, real-time risk control

In a landscape where on-chain governance is often too slow to respond to liquidity crises, embedding intelligence in the protocol layer may be key to resilience.

Closing Thoughts

Reinforcement Learning introduces a powerful new approach for rate governance in decentralized lending protocols. However, it also comes with risks and open questions:

RL agents can be unintuitive and hard to audit—especially in safety-critical systems like financial protocols.
Without proper constraints, they may over-optimize in unexpected ways, increasing rate volatility or misaligning incentives.
And most importantly, they require governance integration: RL decisions should be bounded or monitored by human-defined parameters.

That said, RL offers a compelling solution to the slow reaction times of traditional DAO-based parameter updates. In a world of volatile DeFi markets and systemic contagion, giving protocols the ability to "learn" and "adapt" in real time could be a game-changer.

Link to our paper: https://arxiv.org/pdf/2506.00505

From PhD Research in DeFi

Discussion about this post