FANTASIA (Functional ANnoTAtion based on embedding space SImilArity)

Pipeline for functional annotation of proteins using language models

Info

Members (researchers): Ana Rojas

Research Groups: Computational Biology and Bioinformatics (CBBIO.CABD)

Contact Email: frapercan1@alum.us.es, gemma.martinez@ibe.upf-csic.es, a.rojas.m@csic.es, rosa.fernandez@ibe.upf-csic.es

Tool Repository: https://github.com/CBBIO/FANTASIA

Documentation: https://fantasia.readthedocs.io/en/latest/

Publications DOI: https://doi.org/10.1101/2024.02.28.582465, https://doi.org/10.1101/2024.02.14.580341

Application domain:

Applications of Computational Biology, Artificial Intelligence, Data Analysis, Function prediction, Functional annotation, Functional genomics, Large language models, Model organisms

Technical details
Type of application
  • Command line pipeline
  • Galaxy tool
  • Singularity container
Software compatibility
  • Linux
Hardware requirements
  • For an average peptide FASTA file containing 20k sequences it would need around 50Gb of RAM (the higher the number of sequences the higher the memory requirements).
Programming language
  • BASH
  • Python
Type of containerization
  • Singularity/AppTainer
Wrapper type
  • None
Input file formats
  • fasta
Output file formats
  • Other
Compatibility with other tools
  • Output files are already formatted to be used as input for topGO R package (GO enrichment).