Installation and Configuration¶

Installation¶

Install RDAgent: For different scenarios

for purely users: please use pip install rdagent to install RDAgent
for dev users: See development

Install Docker: RDAgent is designed for research and development, acting like a human researcher and developer. It can write and run code in various environments, primarily using Docker for code execution. This keeps the remaining dependencies simple. Users must ensure Docker is installed before attempting most scenarios. Please refer to the official 🐳Docker page for installation instructions. Ensure the current user can run Docker commands without using sudo. You can verify this by executing docker run hello-world.

LiteLLM Backend Configuration (Default)¶

Note

🔥 Attention: We now provide experimental support for DeepSeek models! You can use DeepSeek’s official API for cost-effective and high-performance inference. See the configuration example below for DeepSeek setup.

Option 1: Unified API base for both models¶

# Set to any model supported by LiteLLM.
CHAT_MODEL=gpt-4o
EMBEDDING_MODEL=text-embedding-3-small
# Configure unified API base
# The backend api_key fully follows the convention of litellm.
OPENAI_API_BASE=<your_unified_api_base>
OPENAI_API_KEY=<replace_with_your_openai_api_key>

Option 2: Separate API bases for Chat and Embedding models¶

# Set to any model supported by LiteLLM.

# CHAT MODEL:
CHAT_MODEL=gpt-4o
OPENAI_API_BASE=<your_chat_api_base>
OPENAI_API_KEY=<replace_with_your_openai_api_key>

# EMBEDDING MODEL:
# TAKE siliconflow as an example, you can use other providers.
# Note: embedding requires litellm_proxy prefix
EMBEDDING_MODEL=litellm_proxy/BAAI/bge-large-en-v1.5
LITELLM_PROXY_API_KEY=<replace_with_your_siliconflow_api_key>
LITELLM_PROXY_API_BASE=https://api.siliconflow.cn/v1

Configuration Example: DeepSeek Setup¶

Many users encounter configuration errors when setting up DeepSeek. Here’s a complete working example:

# CHAT MODEL: Using DeepSeek Official API
CHAT_MODEL=deepseek/deepseek-chat
DEEPSEEK_API_KEY=<replace_with_your_deepseek_api_key>

# EMBEDDING MODEL: Using SiliconFlow for embedding since DeepSeek has no embedding model.
# Note: embedding requires litellm_proxy prefix
EMBEDDING_MODEL=litellm_proxy/BAAI/bge-m3
LITELLM_PROXY_API_KEY=<replace_with_your_siliconflow_api_key>
LITELLM_PROXY_API_BASE=https://api.siliconflow.cn/v1

Necessary parameters include:

CHAT_MODEL: The model name of the chat model.
EMBEDDING_MODEL: The model name of the embedding model.
OPENAI_API_BASE: The base URL of the API. If EMBEDDING_MODEL does not start with litellm_proxy/, this is used for both chat and embedding models; otherwise, it is used for CHAT_MODEL only.

Optional parameters (required if your embedding model is provided by a different provider than CHAT_MODEL):

LITELLM_PROXY_API_KEY: The API key for the embedding model, required if EMBEDDING_MODEL starts with litellm_proxy/.
LITELLM_PROXY_API_BASE: The base URL for the embedding model, required if EMBEDDING_MODEL starts with litellm_proxy/.

Note: If you are using an embedding model from a provider different from the chat model, remember to add the litellm_proxy/ prefix to the EMBEDDING_MODEL name.

The CHAT_MODEL and EMBEDDING_MODEL parameters will be passed into LiteLLM’s completion function.

Therefore, when utilizing models provided by different providers, first review the interface configuration of LiteLLM. The model names must match those allowed by LiteLLM.

Additionally, you need to set up the the additional parameters for the respective model provider, and the parameter names must align with those required by LiteLLM.

For example, if you are using a DeepSeek model, you need to set as follows:

# For some models LiteLLM requires a prefix to the model name.
CHAT_MODEL=deepseek/deepseek-chat
DEEPSEEK_API_KEY=<replace_with_your_deepseek_api_key>

Besides, when you are using reasoning models, the response might include the thought process. For this case, you need to set the following environment variable:

REASONING_THINK_RM=True

For more details on LiteLLM requirements, refer to the official LiteLLM documentation.

Configuration Example 2: Azure OpenAI Setup¶

Here’s a sample configuration specifically for Azure OpenAI, based on the official LiteLLM documentation:

If you’re using Azure OpenAI, below is a working example using the Python SDK, following the LiteLLM Azure OpenAI documentation:

from litellm import completion
import os

# Set Azure OpenAI environment variables
os.environ["AZURE_API_KEY"] = "<your_azure_api_key>"
os.environ["AZURE_API_BASE"] = "<your_azure_api_base>"
os.environ["AZURE_API_VERSION"] = "<version>"

# Make a request to your Azure deployment
response = completion(
  "azure/<your_deployment_name>",
  messages = [{ "content": "Hello, how are you?", "role": "user" }]
)

To align with the Python SDK example above, you can configure the CHAT_MODEL based on the response model setting and use the corresponding os.environ variables by writing them into your local .env file as follows:

cat << EOF > .env
# CHAT MODEL: Azure OpenAI via LiteLLM
CHAT_MODEL=azure/<your_deployment_name>
AZURE_API_BASE=https://<your_azure_base>.openai.azure.com/
AZURE_API_KEY=<your_azure_api_key>
AZURE_API_VERSION=<version>

# EMBEDDING MODEL: Using SiliconFlow via litellm_proxy
EMBEDDING_MODEL=litellm_proxy/BAAI/bge-large-en-v1.5
LITELLM_PROXY_API_KEY=<your_siliconflow_api_key>
LITELLM_PROXY_API_BASE=https://api.siliconflow.cn/v1
EOF

This configuration allows you to call Azure OpenAI through LiteLLM while using an external provider (e.g., SiliconFlow) for embeddings.

If your Azure OpenAI API Key` supports embedding model, you can refer to the following configuration example.

cat << EOF  > .env
EMBEDDING_MODEL=azure/<Model deployment supporting embedding>
CHAT_MODEL=azure/<your deployment name>
AZURE_API_KEY=<replace_with_your_openai_api_key>
AZURE_API_BASE=<your_unified_api_base>
AZURE_API_VERSION=<azure api version>

Execution Environment Configuration¶

Coder Environment Configuration (Docker vs. Conda)

RD-Agent’s coders can execute code in different environments. You can control this behavior by setting environment variables in your .env file. This is useful for switching between a local Conda environment and an isolated Docker container.

To configure the environment, add the corresponding line to your .env file based on the scenario you are running.

For the Model (Quant) Scenario:

The execution environment is determined by the MODEL_COSTEER_ENV_TYPE variable, which is read from rdagent/components/coder/model_coder/conf.py.

To use Docker (recommended for isolated execution):
```
MODEL_COSTEER_ENV_TYPE=docker
```
To use Conda (for running in a local Conda environment):
```
MODEL_COSTEER_ENV_TYPE=conda
```

For the Data Science Scenario:

The execution environment is determined by the DS_CODER_COSTEER_ENV_TYPE variable, which is read from rdagent/components/coder/data_science/conf.py.

To use Docker (recommended for isolated execution):
```
DS_CODER_COSTEER_ENV_TYPE=docker
```
To use Conda (for running in a local Conda environment):
```
DS_CODER_COSTEER_ENV_TYPE=conda
```

Custom Time Segment Configuration (Train / Valid / Test)¶

RD-Agent now supports user-defined time segments for training, validation, and testing (backtesting). Users can customize these segments via environment variables in the .env file, depending on the scenario being executed.

This feature allows greater flexibility when running experiments on different time ranges without modifying code or YAML configurations.

Fin-Factor Scenario¶

When running the fin_factor scenario, you can configure the time segments using the following environment variables. These variables are read by the Factor-related PropSettings and directly affect the execution process.

Add the following entries to your .env file as needed:

QLIB_FACTOR_TRAIN_START=<train start date, default is 2008-01-01>
QLIB_FACTOR_TRAIN_END=<train end date, default is 2014-12-31>
QLIB_FACTOR_VALID_START=<valid start date, default is 2015-01-01>
QLIB_FACTOR_VALID_END=<valid end date, default is 2016-12-31>
QLIB_FACTOR_TEST_START=<test / backtest start date, default is 2017-01-01>
QLIB_FACTOR_TEST_END=<test / backtest end date, default is 2020-12-31>

Fin-Model Scenario¶

When running the fin_model scenario, the model training, validation, and testing time segments can be configured independently via the following environment variables:

QLIB_MODEL_TRAIN_START=<train start date, default is 2008-01-01>
QLIB_MODEL_TRAIN_END=<train end date, default is 2014-12-31>
QLIB_MODEL_VALID_START=<valid start date, default is 2015-01-01>
QLIB_MODEL_VALID_END=<valid end date, default is 2016-12-31>
QLIB_MODEL_TEST_START=<test / backtest start date, default is 2017-01-01>
QLIB_MODEL_TEST_END=<test / backtest end date, default is 2020-12-31>

These settings are used during model training and evaluation and directly impact the execution workflow.

Fin-Quant Scenario¶

When running the fin_quant scenario, RD-Agent supports configuring time segments for factor, model, and quant stages simultaneously.

Note: The QLIB_QUANT_* variables are only used for front-end UI display purposes and do not affect the actual execution process.

You may configure the following variables in your .env file:

QLIB_FACTOR_TRAIN_START=<train start date, default is 2008-01-01>
QLIB_FACTOR_TRAIN_END=<train end date, default is 2014-12-31>
QLIB_FACTOR_VALID_START=<valid start date, default is 2015-01-01>
QLIB_FACTOR_VALID_END=<valid end date, default is 2016-12-31>
QLIB_FACTOR_TEST_START=<test / backtest start date, default is 2017-01-01>
QLIB_FACTOR_TEST_END=<test / backtest end date, default is 2020-12-31>

QLIB_MODEL_TRAIN_START=<train start date, default is 2008-01-01>
QLIB_MODEL_TRAIN_END=<train end date, default is 2014-12-31>
QLIB_MODEL_VALID_START=<valid start date, default is 2015-01-01>
QLIB_MODEL_VALID_END=<valid end date, default is 2016-12-31>
QLIB_MODEL_TEST_START=<test / backtest start date, default is 2017-01-01>
QLIB_MODEL_TEST_END=<test / backtest end date, default is 2020-12-31>

QLIB_QUANT_TRAIN_START=<train start date, default is 2008-01-01>
QLIB_QUANT_TRAIN_END=<train end date, default is 2014-12-31>
QLIB_QUANT_VALID_START=<valid start date, default is 2015-01-01>
QLIB_QUANT_VALID_END=<valid end date, default is 2016-12-31>
QLIB_QUANT_TEST_START=<test / backtest start date, default is 2017-01-01>
QLIB_QUANT_TEST_END=<test / backtest end date, default is 2020-12-31>

This setup allows the front-end to display consistent segment information across different stages while keeping execution logic unchanged.

Configuration(deprecated)¶

To run the application, please create a .env file in the root directory of the project and add environment variables according to your requirements.

If you are using this deprecated version, you should set BACKEND to rdagent.oai.backend.DeprecBackend.

BACKEND=rdagent.oai.backend.DeprecBackend

Here are some other configuration options that you can use:

OpenAI API¶

Here is a standard configuration for the user using the OpenAI API.

OPENAI_API_KEY=<your_api_key>
EMBEDDING_MODEL=text-embedding-3-small
CHAT_MODEL=gpt-4-turbo

Azure OpenAI¶

The following environment variables are standard configuration options for the user using the OpenAI API.

USE_AZURE=True

EMBEDDING_OPENAI_API_KEY=<replace_with_your_azure_openai_api_key>
EMBEDDING_AZURE_API_BASE=  # The endpoint for the Azure OpenAI API.
EMBEDDING_AZURE_API_VERSION=  # The version of the Azure OpenAI API.
EMBEDDING_MODEL=text-embedding-3-small

CHAT_OPENAI_API_KEY=<replace_with_your_azure_openai_api_key>
CHAT_AZURE_API_BASE=  # The endpoint for the Azure OpenAI API.
CHAT_AZURE_API_VERSION=  # The version of the Azure OpenAI API.
CHAT_MODEL=  # The model name of the Azure OpenAI API.

Use Azure Token Provider¶

If you are using the Azure token provider, you need to set the CHAT_USE_AZURE_TOKEN_PROVIDER and EMBEDDING_USE_AZURE_TOKEN_PROVIDER environment variable to True. then use the environment variables provided in the Azure Configuration section.

☁️ Azure Configuration - Install Azure CLI:

`sh curl -L https://aka.ms/InstallAzureCli | bash `

Log in to Azure:

`sh az login --use-device-code `
exit and re-login to your environment (this step may not be necessary).

Configuration List¶

OpenAI API Setting

Configuration Option	Meaning	Default Value
OPENAI_API_KEY	API key for both chat and embedding models	None
EMBEDDING_OPENAI_API_KEY	Use a different API key for embedding model	None
CHAT_OPENAI_API_KEY	Set to use a different API key for chat model	None
EMBEDDING_MODEL	Name of the embedding model	text-embedding-3-small
CHAT_MODEL	Name of the chat model	gpt-4-turbo
EMBEDDING_AZURE_API_BASE	Base URL for the Azure OpenAI API	None
EMBEDDING_AZURE_API_VERSION	Version of the Azure OpenAI API	None
CHAT_AZURE_API_BASE	Base URL for the Azure OpenAI API	None
CHAT_AZURE_API_VERSION	Version of the Azure OpenAI API	None
USE_AZURE	True if you are using Azure OpenAI	False
CHAT_USE_AZURE_TOKEN_PROVIDER	True if you are using an Azure Token Provider in chat model	False
EMBEDDING_USE_AZURE_TOKEN_PROVIDER	True if you are using an Azure Token Provider in embedding model	False

Globol Setting

Configuration Option	Meaning	Default Value
max_retry	Maximum number of times to retry	10
retry_wait_seconds	Number of seconds to wait before retrying	1
log_trace_path	Path to log trace file	None
log_trace_path	Path to log trace file	None
log_llm_chat_content	Flag to indicate if chat content is logged	True
log_llm_chat_content	Flag to indicate if chat content is logged	True

Cache Setting

Configuration Option	Meaning	Default Value
dump_chat_cache	Flag to indicate if chat cache is dumped	False
dump_embedding_cache	Flag to indicate if embedding cache is dumped	False
use_chat_cache	Flag to indicate if chat cache is used	False
use_embedding_cache	Flag to indicate if embedding cache is used	False
prompt_cache_path	Path to prompt cache	./prompt_cache.db
max_past_message_include	Maximum number of past messages to include	10

Loading Configuration¶

For users’ convenience, we provide a CLI interface called rdagent, which automatically runs load_dotenv() to load environment variables from the .env file. However, this feature is not enabled by default for other scripts. We recommend users load the environment with the following steps:

⚙️ Environment Configuration
- Place the .env file in the same directory as the .env.example file.
  
  The .env.example file contains the environment variables required for users using the OpenAI API (Please note that .env.example is an example file. .env is the one that will be finally used.)
- Export each variable in the .env file:
  export $(grep -v '^#' .env | xargs)
- If you want to change the default environment variables, you can refer to the above configuration and edith the .env file.