Projects
A selection of things I've built — production work, research experiments, and small explorations.
Projects
-
Road Track Identification with U-Net
A low-latency tool that uses image segmentation to identify road tracks. Built on the U-Net architecture, intended as an educational exploration of computer vision in this problem space.
-
Research Paper Search
Context-aware paper search powered by sentence transformers. Researchers can input natural-language queries and retrieve more relevant abstracts than with keyword-only search.
-
Swahili Language Model (MLM)
A masked language model pre-trained on a large Swahili corpus in a self-supervised fashion — randomly masks 15% of the tokens and learns to predict them.
Gists & notebooks
-
Fine-tuning BERT-uncased for Swahili text classification
A walkthrough on fine-tuning a BERT-uncased MLM for multi-class text classification on a Swahili dataset.
-
N-gram language modelling
A brief illustration of language modelling using NLTK datasets and library.
-
Pillow image verification
A quick approach to verifying real images during classification — checking for green pigment after resizing to 256×256 to keep things fast.