About

Project
GlyphMotion

The Architects

Team Rhythm: Shitij leads the architecture and web platform (the main target) and built ~90% of the core processing pipeline. Sayan ports pipeline upgrades to standalone script + GUI and currently leads ongoing maintenance for both script and GUI.

Shitij Halder

Shitij Halder

Creative Web Developer & Full Stack Engineer

Lead Architect Web Platform

DIT University, Dehradun · Originally from Siliguri, West Bengal

The mind behind the entire web platform, backend pipeline, and the visual identity of GlyphMotion. Built 90% of the processing pipeline, designed the cloud architecture with Google Drive integration, and engineered the asynchronous video processing backend from scratch. Makes web design feel more like cinema than code.

Open Source & Projects

GlyphMotion Web ShutterBug SKY Healthcare Realme Narzo 30 (salaa) Redmi 6 (cereus) git-cherry-picking

Skills

HTML/CSS/JS Python Full Stack AOSP AI/ML C
Sayan Sarkar

Sayan Sarkar

Psychology Student · Android Developer & Open Source Contributor

Co-Architect Script & GUI

Singur Government General Degree College · Originally from Kolkata, West Bengal

The one who found the spark and lit the fuse. Built the initial YOLOv8 parsing script with CLI arguments, maintained the standalone script and GUI version, and handles 80% of the standalone/GUI maintenance. A psychology student who debugs code the same way he reads human behavior — with alarming precision.

Open Source & Projects

GlyphMotion Script GlyphMotion GUI SpotDeck TG Media Bot Redmi 6A (cactus) Redmi Note 11 (spes) Redmi 13 5G (breeze) Nothing Phone 3a (asteroids)

Skills

Python AOSP/Linux Shell C Photography Kernel Dev

Project GlyphMotion is not just code — it is your years of obsession, sleep debt, debugging scars, and the emotional volatility of shipping something that actually matters. This page now reflects that heartbeat.

Who should use GlyphMotion?

Creators: make visual-first videos with tracked overlays while keeping original media quality and sound.
Students: learn practical computer-vision workflows by exploring a real pipeline built under real constraints.
Researchers: prototype detection + tracking ideas quickly with a reproducible stack across web, CLI, and GUI.

How to start in 2 minutes

01 · Upload
Choose your input clip and send it to the pipeline.
02 · Process
GlyphMotion runs detection/tracking and applies the configured workflow.
03 · Export
Download the processed output with final visuals and preserved audio.

Pipeline Anatomy (6 steps)

01Input
02Decode
03Detect
04Annotate
05Encode
06Mux

The Origin Tea ☕

// How two Class 12th students built a computer vision pipeline out of spite

This page means one simple thing: when resources are low but intent is high, two people can still build production-grade systems by sharing knowledge, dividing roles well, and refusing to quit.

Timeline Chapters
8+ major phases
Core Builds
Web • CLI • GUI
Fuel Source
WiFi + caffeine + rage
Chapter 01 — The Spark

The Reel That Started It All

One fine day, Sayan was doom-scrolling Instagram reels at full goblin mode — as any civilized engineer does — and stumbled on a reel showcasing something called YOLO (You Only Look Once). Some guy had written about 30-40 lines of code that could take a video, run object detection, and spit out a processed version. It was basic, hardcoded, zero CLI arguments, just raw script-path-dependency-chaos vibes. But it worked. And Sayan got instantly hooked.

The catch? The guy asked viewers to comment "Link" to get the code. Let that sink in — you're showcasing an open-source project, freely available to the entire planet, and you're gatekeeping it behind Instagram comments for engagement farming. Elite cringe. Hall-of-fame gatekeeping.

Sayan actually commented "Link." The creator never replied. Our man got ghosted harder than a left-swipe on Bumble.

Sayan told Shitij the whole saga. They were both furious. Two broke students, one shared frustration, and a mutual verdict: "Fine, we'll build our own from scratch."

Then: Gatekept reel + hardcoded snippet. Now: A public, documented, multi-platform pipeline anyone can study and run.
Chapter 02 — The Setup

Post-Boards, Pre-Purpose

Context: Both of them had just finished their CBSE Class 12th board exams. That post-boards limbo where you've got no school, no college admissions yet, and an alarming amount of free time. They were already building random side quests — but now they had a real boss fight.

GTX 1650 — Shitij's Laptop RTX 3050 4GB — Sayan's Laptop 16GB RAM — Both

Neither of them had GPUs remotely close to "adequate" for real-time neural inference. But inadequate hardware has never stopped determined developers. It just makes the journey more... cinematic and slightly unhinged.

Then: Post-board confusion and random side quests. Now: A focused mission with real product direction.
Chapter 03 — First Blood

Sayan Writes the First Script

Sayan got to work first. He parsed the YOLOv8 model, got basic object detection running, and — crucially — added proper argparse support with command-line arguments. No more editing file paths in the source code like a caveman. The script could accept input via CLI flags, which meant normal humans could run it without summoning dark terminal rituals.

He passed it to Shitij. Two immediate problems surfaced:

Problem #1: Processing Time

With a GTX 1650 and a 3050 4GB, inference was painfully slow. Every frame through YOLOv8 felt like an eternity. But they pushed forward anyway.

Problem #2: No Audio

YOLO — and computer vision models in general — don't care about audio. The output video was completely silent. Not exactly production-ready.

Then: Slow detection and silent output. Now: Proper CLI + engineered media pipeline with full output fidelity.
Chapter 04 — Audio

FFmpeg Enters the Chat

They needed the original audio preserved in the output. Enter FFmpeg — the Swiss army knife of multimedia processing. They engineered a pipeline that would extract the original audio from the source video, process frames through YOLOv8 for visual annotations, encode them back into a video using libx264 with CRF-based compression, then multiplex the original audio stream back in.

And just like that — they had their first fully working script. A video goes in, a tracked-and-annotated video with original audio comes out. Voilà.
Then: Frames-only output with missing soul. Now: FFmpeg-powered muxing with original audio restored.
Chapter 05 — Going Remote

Shitij Builds the Web Platform

The script worked locally. But they wanted to use it remotely — because why not? Shitij, being deep into web development, decided to build an entire web platform around the pipeline. Upload a video through the browser, the backend processes it using the exact same pipeline, and the result gets served back.

Over the next few months, Shitij built out the full stack: the frontend interface, the Flask-based backend with async processing, Google Drive integration for storage, automatic GitHub Pages deployment via PyGithub, Telegram bot integration, real-time SSE status updates, PWA support, admin dashboards — the whole nine yards. 90% of the web platform was Shitij's work, with Sayan providing continuous support (emotional and technical — because building this was, in his own words, "diabolical but fire").

Then: Local-only script. Now: Async web platform with automation, deployments, and real-time status.
Chapter 06 — Zero Budget

$0 Infrastructure. 100% Uptime.

Here's the thing nobody talks about: they had zero money. No budget for domains. No budget for servers. No budget for cloud hosting. Two students running a computer vision project with nothing but their laptops, their WiFi, and whatever caffeine they could get their hands on.

But they had one ace — their GitHub Education Pack. Through the GitHub Student Developer Pack, they scored a free domain via name.com. That gave the project an actual web presence.

But GitHub Pages — while giving them 100% frontend uptime for free — doesn't support any backend. No server-side processing, no APIs, nothing. Just static files. For a project that needs to process videos through a neural network? That's a dealbreaker. Or so it seemed.

The Cloudflare Tunnel Hack

Shitij rigorously searched for a way to make the backend work. The solution? Cloudflare Tunnels — which can expose any local network to the internet without port forwarding, without a static IP, without paying a dime.

The architecture: GitHub Pages hosts the frontend 24/7 with 100% uptime. Video metadata is fetched by polling a videos.json file in the GlyphMotion GitHub repository — no backend needed for that. When they needed to process a video or test something, they'd simply spin up the Cloudflare tunnel from their laptop, and the backend would be live on the internet. Turn it off when done.

Integrating GitHub Pages with Cloudflare was the biggest challenge — but once cracked, it gave them something nobody else has: a GitHub Pages site with an on-demand backend. When Shitij showed this setup to someone in the industry, their reaction was genuine shock.

100% open source. Zero extra cost beyond their laptops, WiFi, and coffee. Every single tool in the stack — free. The entire project exists because two students refused to let budget be an excuse.
Then: No money, no server, no obvious path. Now: Clever on-demand architecture with $0 infra burn.
Chapter 06.5 — The Crash

The Night Everything Felt Broken

Right after the first stable release landed, the internet suddenly felt like it had flatlined. During the Cloudflare outage window, traffic dropped off a cliff and the project looked dead from the outside.

At 2AM, far from home in Malda, that hit hard enough to bring tears. Real tears. Confusion, panic, grief — then a bad spiral where chunks of documentation and uncommitted work got deleted. For 2-3 days, both of them were emotionally cooked. Later they learned it was the outage, not the end. They rebuilt, recovered, and got back to shipping.

Some projects are built with money. This one was built with stubbornness, friendship, and the kind of resilience you only discover after your worst midnight moment.
Then: Panic, grief, and accidental deletions. Now: Stronger recovery mindset, tougher version, tighter bond.
Chapter 07 — The Split

Three Forks, One Pipeline

Now there were three versions of the same core pipeline that needed to coexist:

Web Platform

The full cloud solution with upload, processing, and automated deployment. Primarily maintained by Shitij.

Standalone CLI

The raw command-line script. No web server needed. Pipeline changes ported from the web version. Maintained 80% by Sayan.

Desktop GUI

Sayan's cross-platform desktop app for Linux & Windows. Visual interface without needing a terminal.

Since the web platform's processing pipeline was always the most complete, changes flowed downstream: web → standalone → GUI. Sayan handled the majority of the standalone and GUI maintenance, while Shitij kept the web platform evolving.

Then: One script doing everything badly. Now: Three focused builds sharing one evolving core.
Chapter 08 — The Backstory

Before GlyphMotion: The Android Years

Long before GlyphMotion existed, both of them were deep in the Android custom ROM scene. They started on devices that shared the same board — Redmi 6 (cereus) for Shitij and Redmi 6A (cactus) for Sayan — building and maintaining custom ROMs with proper authorship and kernel optimizations.

Later, Shitij maintained ROMs for the Realme Narzo 30 (salaa) — a device so cursed it should come with a therapy coupon. Sayan moved to the Redmi Note 11 (spes), which he still maintains to this day, and later picked up the Redmi 13 5G (breeze) and even the Nothing Phone 3a (asteroids).

Beyond ROMs, they've contributed to various open-source projects — SpotDeck (a Spotify Car Thing replacement), Telegram media bots, kernel trees, vendor blobs — the whole Android development ecosystem. This background in low-level system work and open-source collaboration is exactly what made GlyphMotion possible.

Then: ROM/device trenches and kernel chaos. Now: That same low-level discipline powering computer-vision engineering.
Present Day

From a 30-Line Script Reel to a Full Pipeline

What started as frustration over a gatekept Instagram reel turned into a full-scale, multi-platform computer vision pipeline with asynchronous processing, CUDA acceleration, cloud integration, a production web app, a standalone CLI, and a desktop GUI.

Two students. Two mid-tier laptops. One shared goal. Zero handouts. Everything you see on this site — from the pipeline architecture to the documentation to the changelogs — was built from the ground up. Not because someone asked them to, but because some guy on Instagram wouldn't share a link.

Then: Anger at gatekeeping. Now: A fully shipped story proving open knowledge can beat budget.

Builder Notes & Gratitude

// What this journey taught us, and who helped us stand back up

If You're Building With Zero Budget

Use free tiers like puzzle pieces, not as a full solution. Combine static hosting, on-demand backend, and storage automation.
Optimize for resilience, not aesthetics first: backups, commit discipline, and reproducible scripts save your sanity.
Trade money with effort: documentation, scripts, and repeatable workflows become your compounding asset.
Ask shamelessly, search deeply, experiment fast. Most paywalled answers are often reconstructible with patience and grit.
Build in public when possible — your future collaborators are watching your consistency, not your budget.

Credits We Owe

FFmpeg Project
To Fabrice Bellard and the global FFmpeg contributors — your work made our audio/video pipeline real.
YOLO Lineage
To Joseph Redmon, Ali Farhadi, and the broader YOLO research lineage; and to the Ultralytics team for practical modern implementations.
Open Source Foundations
Python core devs, Flask/Pallets, Linux maintainers, OpenCV contributors, and the entire tool ecosystem we stood on.
Infrastructure Allies
GitHub, GitHub Education Pack, Cloudflare tooling, and every free-tier service that gave two students a fighting chance.
Communities & Strangers
Stack Overflow answers, random forum comments, issue threads, docs writers, and tutorial creators we may never personally meet.
Each Other
When one of us crashed, the other carried the build. This project exists because friendship stayed online when the internet felt offline.

Why Open Source?

Because we know what it feels like to be locked out when you're hungry to learn. Open source is not just code sharing — it's dignity sharing. It's telling the next broke, stubborn student that they don't need permission to build something meaningful. If this pipeline helps even one person skip the gatekeeping we faced, every sleepless night was worth it.

“Everything in this world is free—if you know how to question, search, and work through chaos. Even starting from zero, the impossible today is what people will pay billions for tomorrow.” — Shitij Halder
“Never let money minded capitalism hit your Open-source-project” — Sayan Sarkar

Cost vs Capability

Infra Cost
$0 baseline stack with student-tier leverage
Processing Model
Asynchronous pipeline with queue-based stages
Deployment Reach
Web + Standalone CLI + Desktop GUI

Architectural Decisions

Decision 01: Web platform stays the canonical source of pipeline evolution. Why: one main branch prevents feature drift and keeps downstream ports consistent.
Decision 02: Separate stages for decode, inference, and encode with FFmpeg muxing. Why: better throughput, cleaner recovery points, and preserved audio integrity.
Decision 03: Hybrid static+on-demand backend infrastructure. Why: near-zero operating cost while still enabling full pipeline execution when needed.

Open Questions (Still Solving)

Optimization frontier: how far can we reduce latency and memory pressure without compromising tracking stability?
Quality frontier: current output quality is around ~83% VMAF in our observed runs; the long-term target is 86-89+, especially for near-4K preservation.
Hard challenge: can we improve perceptual quality and VMAF together while keeping inference practical on mid-tier hardware?

Hard Limits We Accept (for now)

GPU tier reality: optimization targets are designed around mid-tier hardware, so some premium-grade throughput remains out of reach for now.
Inference-speed tradeoff: aggressive quality settings can increase processing time, especially on longer and higher-resolution inputs.
Compression tension: pushing near-4K quality while keeping practical output sizes is still an active balancing problem.

What We Won't Compromise On

Open-source transparency: knowledge should be shareable, inspectable, and extendable.
Reproducibility: behavior should be explainable and repeatable across runs and environments.
Audio preservation: output should not lose the original media soul.
Maintainability: architecture decisions should favor long-term evolution over short-term hacks.

Lessons We Paid For

Always commit before panic.
If there is no backup, there is no mercy.
Free infra is powerful, but only if your architecture is smarter than your outages.
A prototype proves possibility; discipline proves longevity.
When motivation dies, routine must take over.
Letter to Future Us · 18 March 2026

If you're reading this later...

Remember the nights when everything felt broken, the days when the page looked empty, and the moments we questioned whether this was even worth continuing.

We built anyway. We learned anyway. We came back anyway. If future success ever makes us forget the grind, read this page again and stay humble, hungry, and kind to the next builder.

Contributors Welcome

Docs: tighten setup guides, architecture diagrams, and troubleshooting for first-time users.
Tests: add reproducible validation for pipeline stages, regressions, and output consistency.
Perf: profile bottlenecks, improve queue/encode behavior, and push quality-efficiency trade-offs smarter.
UI: polish usability and onboarding while preserving the current design language.
Want to contribute? Open an issue

Thanks Wall (Founding External Contributors)

Reserved slots: this wall will list the first external contributors who meaningfully improve docs, tests, performance, or UI.
How to get listed: open an issue, take ownership, submit a quality PR, and help us move the quality/performance frontier forward.
Slot 01 · Open Slot 02 · Open Slot 03 · Open Slot 04 · Open Slot 05 · Open

Two Builders. One Shared Obsession.

A story people can relate to: two friends, limited resources, and relentless execution over excuses.

Shitij · Architecture + Platform Sayan · Script + GUI + Maintenance
Still Building...

Build Stack Snapshot

YOLOv8 FFmpeg Flask SSE GDrive Cloudflare Tunnel

Support this project

Privacy & Data Retention

Admin-side deletion policy: when a video is deleted from our admin portal, both the stored video data and the actual video file are permanently deleted from Google Drive to save space.
Local processing cleanup: videos on our laptops are manually deleted once processing is done. We take full responsibility for this manual cleanup cycle.
How to request deletion: share the video link (you can copy it using the copy button). That link is enough for us to trace and delete the related data.
Location analytics scope: location tracking is limited to country and state only, used only to understand where GlyphMotion is being used most. This data is never published. Raw/core tracking data is manually cleaned at regular intervals; only aggregate stats are retained.

Explore Public Pages