skip to content

Coffee Cluster

/ 2 min read

About

Coffee Cluster is a hobby project done while taking UC Berkeley’s Data 144, Data Mining and Analytics. The initial technical goal was to use clustering algorithms to find coffee beans that are similar to the ones you have enjoyed, based on textual description/evaluation and a handful of numerical and categorical features.

Content delivery-wise, I was looking for a simple, clean, functional UI. Since web development was not the main focus of the project, I prioritized a rapid prototyping experience. Snowflake’s Streamlit was the first thing that came to mind. Having worked with other front-end frameworks, I think it’s a beautiful idea that you can write plain python code to generate interactive web apps (and when I think about it, I was reminded of PyTorch, where your frontend python code invokes C++ functions, and you never have to worry about that layer of abstraction).

  1. GitHub repo: https://github.com/ronyw7/coffee-cluster/
  2. Streamlit app: https://coffee-cluster.streamlit.app/

Coffee Cluster + LLM?

With the more recent rise of LLMs, I think a more compelling use case might be to store each coffee bean’s vector in a RAG and have the model query this database. We can then integrate this with an online store where the customer receives detailed, compelling recommendations as they browse their shopping carts or product pages. For its low cost, this should be feasible even for small businesses and roasters.