🦦 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

by Huang Huang*, Fangchen Liu*, Letian Fu*, Tingfan Wu, Mustafa Mukadam, Jitendra Malik, Ken Goldberg, Pieter Abbeel at UC Berkeley and Meta (*equal contribution).

[Paper] | [Project Page]

This repo contains the official implementation for Otter: A Vision-Language-Action Model with Text-Aware Feature Extraciton. We also released a Pytorch Implementation.

Further information please contact Huang Huang, Fangchen Liu, Letian Fu, or post an issue on Github!

Updates

2025-03-05: Initial code release.
WIP: instructions on training, inference.
WIP: release pretrained models.

Training

python scripts/train.py --config.save_dir=<...>

Contributing

Experimental things and training/eval scripts should go in experiments/<your_name>. To make any changes to files outside of your experiments directory, please open a pull request.

To enable code checks and auto-formatting, please install pre-commit hooks:

pre-commit install

Environment

conda create -n otter_jax python=3.10
conda activate otter_jax
pip install -e .
pip install -r requirements.txt
conda install -c conda-forge cudatoolkit=11.8
conda install -c conda-forge cudnn=8.9

For GPU:

pip install --upgrade "jax[cuda11_pip]==0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

For TPU

pip install --upgrade "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

See the Jax Github page for more details on installing Jax.

License

This project is under the Apache 2.0 license. See LICENSE for details.

Acknowledgement

We thank the authors of Octo for providing an easy-to-use codebase for training vision-language-action models.

Citation

Please give us a star 🌟 on Github to support us!

Please cite our work if you find our work inspiring or use our code in your work:

@article{huang2025otter,
    title={Otter: A Vision-Language-Action Model with Text-Aware Feature Extraciton}, 
    author={Huang Huang and Fangchen Liu and Letian Fu and Tingfan Wu and Mustafa Mukadam and Jitendra Malik and Ken Goldberg and Pieter Abbeel},
    journal={arXiv preprint arXiv:2503.15980},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
otter		otter
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦦 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Updates

Training

Contributing

Environment

License

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦦 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Updates

Training

Contributing

Environment

License

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages