(Content shamelessly stolen from the projectās Devpost.)
Technology Used
Tagging system to be added.
Raison Dāetre
We all struggle with staying focused online.
Honk is a browser extension that helps users be more aware of when they get off topic while studying. It uses ML to determine if the user is going down a rabbithole or rapidly changing topics in a way unconducive to learning. In addition, Honk offers accountability features such as texting a friend when the user is unproductive.
Infrastructure
The frontend is built from Next.js (React) with Chakra UI used for components. We implemented Firebase Auth with based on phone number for login. Our main product is a Chrome extension which interfaces through our website with an iframe (hacky way to avoid most extension specific work).
The backend is a FastAPI REST microservice that serves both keyword extraction and document similarity analysis. For extraction, YAKE! and SpaCy were used initially for keywords from statistical features (not pretrained on a corpus!) and entities extraction respectively. KeyBERT (BERT embeddings + cosine similarity, in short) proved to be more effective in n-gram extraction, likely due to the pretraining. For document similarity, we used MiniLM, a lightweight distilled transformer model (created via ādeep self-attention distillationā which is a process I donāt quite understand yet).
We structured the project as a monorepo and use Docker Compose with GitHub Actions to automagically deploy to a Google Cloud Run on push. We have a random MongoDB instance in our composition for no reason (I donāt know why I felt the need to mention that). Caddy was used as a reverse proxy.
Challenges
This project was actually a massive pivot. We were initially planning to make a VR hack for watching movies with friends but encountered a multitude of problems. Feeling thoroughly burnt out around dinner (1900 EST), we pivoted to Honk! leaving us with about 12 hours to complete our hack.
We had a JS REST API haphazardly attached to our frontend via Next.js magic that communicated directly with the Python microservice. All of the DB (Firebase) work was done through JS and that meant we had to serialize vectors then transfer them. Numpy was rather uncooperative and conveying the JSON programmatically rather than through Postman somehow surfaced issues with little-endian notation and structures that were just being parsed as strings so we had to parse it twice then that broke for whatever reason but it seems to be working now so ĀÆ(ć)/ĀÆ.
Through brilliantly inefficient code we managed to hit the Firebase free tier limit. We were excited to have the opportunity to actually spend some GCP credit but it seems that weāll fail to hit even the 5 dollar mark on the Blaze plan.