The Problem Every Crate Digger Knows
You are at a flea market or a charity shop. You are flicking through a crate and you pull out a record you half-recognise. Is it a common repress worth nothing, or an original pressing that collectors are hunting? Is the want-to-have ratio on Discogs making this a find? Is anyone selling it on Vinted right now, and for how much? You have about thirty seconds before someone else walks past.
CrateVision is built for exactly this moment. It is a Telegram bot that takes a photo of a vinyl record, identifies it using computer vision, cross-references it across multiple data sources, and tells you whether to buy it — all in under a minute.
How It Works
Step 1: Photograph the Record
Send a photo to the CrateVision Telegram bot — cover, label, or both. The bot accepts both compressed Telegram photos and uncompressed image documents. No special setup is needed; if you have Telegram, you have CrateVision.
Step 2: Computer Vision Identification
The image is sent to a vision-language model (Qwen2.5-VL-7B, with fallback to Qwen3-VL-8B and Gemma 3-27B if the primary is unavailable). The model is given a strict instruction: read only the text that is clearly visible on the cover or label. Artist name, title, record label, catalogue number, year, format — extracted directly from what is printed, nothing guessed or inferred.
The result is a structured JSON object that feeds the next steps in the pipeline. If the image is not a vinyl record, the bot says so immediately rather than returning a meaningless result.
Step 3: Discogs Lookup
The extracted artist and title are searched against the Discogs database. The bot retrieves full release details: the have/want ratio (how many collectors own it versus how many want it), lowest current sale price, number of copies for sale, pressing details, tracklist, genres, and cover art.
The have/want ratio is the most important single number for assessing a record’s desirability. A release with 500 haves and 2000 wants is actively sought by collectors. A release with 10000 haves and 200 wants is everywhere and worth very little. CrateVision surfaces this signal immediately.
Step 4: Vinted Price Check
Discogs shows what collectors pay in a specialist marketplace. Vinted shows what the general second-hand market looks like — faster-moving, less curated, often more reflective of what a record will actually sell for in a week. CrateVision checks both, giving the current price range across listings, so you know the floor and ceiling of what you could realistically flip it for.
Step 5: BPM and Musical Key
Via the GetSongBPM API, CrateVision retrieves the tempo and musical key for the release. For DJs, this is immediately practical: a record at 127 BPM in A minor tells you exactly where it fits in a set before you have heard it. The danceability score provides an additional signal for whether a record works on a floor.
Step 6: Sample Connections via WhoSampled
CrateVision queries WhoSampled for each track on the release. It surfaces two types of connections: samples the album itself contains (what earlier music it draws from), and samples taken from the album by later artists (who has flipped these tracks). A record that has been heavily sampled by hip-hop producers is often more collectable than its original pressing price suggests — the cultural footprint matters to collectors and to buyers who know the music.
Step 7: AI Verdict
All of this data — Discogs want/have ratio, price data, Vinted listings, pressing details — is passed to a 72B language model with a clear analytical brief. The model acts as a market analyst for the crate digger, weighing the signals in order of importance and returning one of three verdicts:
- BUY — high demand, good resale value, pick it up
- MILD — decent record with some demand, worth it if cheap
- SKIP — common, low demand, not worth the space
Along with the verdict, the bot returns a brief reasoning (why the numbers support this call) and a context note — interesting facts about the record from the model’s knowledge, notable samples, why it matters to collectors, things worth knowing that the data alone does not tell you.
Step 8: Automatic Instagram Posting
Every BUY or notable result is queued for automatic posting to Instagram. The bot generates a caption with the artist, title, year, genre, Discogs stats, BPM and verdict, and posts the album cover art automatically. This turns the crate digging session into a live Instagram feed with no manual effort from the user.
The Correction Flow
Vision models occasionally misread partially obscured text or confuse a reissue for an original. CrateVision handles this with a correction command: typing correction: it's the 1971 UK first pressing, not the reissue sends the previous result and the correction text to the LLM, which updates the extracted data and re-runs the full pipeline with the corrected information. No need to retake the photo or start again.
Personal Dashboard and Stats
Every search is logged to a personal history. Users can call up their stats via the /mystats command to see their total searches, their BUY/MILD/SKIP breakdown, and their most-searched artists. A web dashboard provides a fuller history view, useful for tracking patterns across a digging session or comparing finds across different markets.
The Architecture
CrateVision is built on python-telegram-bot and runs as a persistent polling process on the theoracle server. The vision and analysis models are served via the HuggingFace Inference API with automatic fallback across multiple models if the primary is unavailable. Discogs and Vinted are queried via their respective APIs, BPM data comes from GetSongBPM, and sample connections come from WhoSampled. A SQLite database handles search history and the Instagram post queue. A lightweight Flask dashboard runs as a background thread alongside the bot process.
The entire pipeline — image to verdict — completes in under sixty seconds in normal conditions.
Why This Is a Good Example of Applied AI
CrateVision is worth noting not because it does anything technically exotic, but because it assembles existing tools into something genuinely useful for a specific human activity. Computer vision for text extraction. API aggregation for market data. A large language model for synthesis and judgement. Telegram for the interface because that is where the user already is, at a flea market with their phone.
This is the pattern that makes practical AI applications work: identify a specific moment in someone’s day where better information would change their decision, then build the shortest possible pipeline from that moment to that information.
If you have a similar use case — a specific decision that needs faster, better-structured data — get in touch. We build AI pipelines that connect real-world inputs to actionable outputs.