I only just found out what these embedding LLMs do the other day, and Google just dropped a new 308M param model!
I really like how they’re open-weighting(?) these smaller models that can run on super low end hardware. They dropped Gemma 3 270M a few weeks ago and whilst “dumb”, they’re very easy to fine-tune on places like Google Colab.
Pretty excited to see how apps could use these on-device models for search features:
enables you to build applications using techniques such as Retrieval Augmented Generation (RAG) and semantic search that run directly on your hardware