RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation

Liheng Zhang^*, Lexi Pang^*, Hang Ye, Xiaoxuan Ma, Yizhou Wang
Peking University
^*Equal contribution

Environment setup

After cloning this repository, run

conda env create -f environment.yml
conda activate richcontrol

Example

For example image generation, run

python run_richcontrol.py

After that, in the directory results you should get something like

Feel free to add various images and create your own prompts for controllable generation!

Image input format can be found in configs/image_config.yaml.

Model configurations can be found in configs/model_config.yaml. Note that the Appearance-Rich Prompting module is disabled in the default config. Please update the configurations if you would like to use a prompt model.

Dataset

To download our dataset, click here. As described in the paper, the dataset comprises 150 image-prompt pairs spanning 7 condition types ("canny edge", "depth map", "HED edge", "normal map", "scribble drawing", "human pose", "segmentation mask") and 7 semantic categories ("animals": 58, "humans": 26, "objects": 20, "buildings": 16, "vehicles": 12, "scenes": 10, "rooms": 8). The dataset is organized as follows:

images
  canny
    beetle_canny
      condition.png
    cat_cartoon
      condition.png
    ...
  depth
    bedroom_depth
      condition.png
    castle_cartoon
      condition.png
    ...
  hed
    ...
  normal
    ...
  pose
    ...
  scribble
    ...
  seg
    ...
  image_config_dataset.yaml

For clarity, we split the entire dataset into 7 folders corresponding to 7 condition types. The file data_prompt-driven.yaml contains all metadata used in our evaluation experiments, where each entry includes the image file path, an inversion prompt, and a generation prompt:

- condition_image: canny/beetle_canny/condition.png
  inversion_prompt: a canny edge map of a volkswagen beetle
  prompt: a cartoon of a volkswagen beetle
- ...
...

Although our method does not require DDIM inversion, we include the inversion_prompt field to facilitate comparisons with baseline methods that rely on DDIM inversion. Our dataset is based on datasets from prior work including Ctrl-X, FreeControl, Plug-and-Play, and ADE20K. We gratefully acknowledge their contributions to the community.

Acknowledgement

Our code is inspired by repositories including Ctrl-X, Plug-and-Play and Restart sampling. We thank their contributors for sharing the valuable resources with the community.

Contact

For any questions and discussions, please contact Liheng Zhang (zhangliheng@stu.pku.edu.cn).

Reference

If you use our code in your research, please cite the following work.

@misc{zhang2025richcontrolstructureappearancerichtrainingfree,
      title={RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation}, 
      author={Liheng Zhang and Lexi Pang and Hang Ye and Xiaoxuan Ma and Yizhou Wang},
      year={2025},
      eprint={2507.02792},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.02792}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
data		data
pipelines		pipelines
utils		utils
README.md		README.md
environment.yml		environment.yml
run_richcontrol.py		run_richcontrol.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation

Environment setup

Example

Dataset

Acknowledgement

Contact

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation

Environment setup

Example

Dataset

Acknowledgement

Contact

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages