# Real-time learning in data streams using stochastic gradient descent

In a typical supervised learning problem we have a fixed volume of training data with which to fit a model. Imagine, instead, we have a constant stream of data. Now, we could dip into this stream and take a sample from which to build a static model. But let’s also imagine that, from time to time, there is a change in the observations arriving. Ideally we want a model that automatically adapts to changes in the data stream and updates its own parameters. This is sometimes called “online learning.”

We’ve looked at a Bayesian approach to this before in Adaptively Modelling Bread Prices in London Throughout the Napoleonic Wars. Bayesian methods come with significant computational overhead because one must fit and continuously updated a joint probability distribution over all variables. Today, let’s look at a more lightweight solution called stochastic gradient descent.

In this post I’ll demonstrate one way to use Bayesian methods to build a ‘dynamic’ or ‘learning’ regression model. When I say ‘learning,’ I mean that it can elegantly adapt to new observations without requiring a refit.

Today we lay our scene in London between the years 1760-1850. The good people of London have bread and they have wheat, which is, of course, what bread is made from. One would expect that the price of bread and the price of wheat are closely correlated and generally they are, though in periods of turmoil things can come unstuck. And Europe, between 1760-1850, was a very eventful place.

# Archaeological Illustration of Bayesian Learning

This is the first of a series of posts on Bayesian learning. The learning mechanism which is built into Bayes’ theorem can be used in all sorts of clever ways and has obvious uses in machine learning. Today we’ll build our intuition of how the mechanism works. In the next post, we’ll see how we can use that mechanism to build adaptive regression models.

Bayesian statistics work in the same way as deductive reasoning. Bayes’ theorm is a mathematical form of deductive reasoning. It took me a long time to understand this simple but profound idea. Let me say it again: Bayes’ theorm gives us a mathematical way to do deductive reasoning: it allows us to take a piece of evidence, weigh it against our prior suspicions, and arrive at a new set of beliefs. And if we have more evidence, we can do it over and over again.

Let’s do an example. It occured to me that archaeology is an excellent illustration of how this works.

## We are archaeologists

We’re excavating the site of a Roman village. Our job is to try and pinpoint what year(s) the village was likely inhabited. In probability terms, we want to work out `p(inhabited)` for a range of years. As we uncover relics like pottery and coins, we use that evidence to refine our beliefs about when the village was inhabited.

# Bayesian Vitalstatistix: What Breed of Dog was Dogmatix?

In the Asterix comics we are never told just what breed Obelix’s little dog Dogmatix is. In fact, he is specifically referred to as being an “undefined breed” in his first appearance in Asterix and the Banquet.

I’ve compiled four hypotheses:

• West Highland White Terrier: In the first two Asterix movies, the part of Dogmatix was played by a West Highland white terrier (a “Westie”). Of all the present-day dog breeds, westies look most like Dogmatix. There is one major hole in the westie hypothesis: it’s extremely unlikely that a Westie would be found in Gaul circa 50BC. That’s because the dogs didn’t exist back then.

• Melitan: A melitan was a breed of lapdog popular with Romans and Greeks in ancient times. They were small companion dogs, typically pampered things who existed solely to amuse their owners, typically noblewomen. “The Melitan was a small, fluffy, spitz-type dog, commonly white in colour… with a very appealing, pointed, fox-like muzzle.” In appearance, they sound nothing like Dogmatix. But the Melitan hypothesis is very strong for another reason: these small dogs actually existed as pets in 50BC. There’s no record of them amongst Gauls, however. We’d have to imagine that rowdy, warring Obelix came into possession of a small Roman lapdog during a raid.

• A Gallic war dog, a.k.a. Irish Wolfhound: What we call an Irish wolfhound is a very ancient breed of dog which Celts were known to use as guard dogs and in combat. Caesar is supposed to have mentioned them in his Commentaries on the Gallic War, though I couldn’t find the reference. This is the only breed of dog which we might realistically find in a Gaulish village in 50BC, which makes it a good hypothesis for Dogmatix. The obvious flaw: wolfhounds are huge. Maybe Dogmatix could be a wolfhound, but he’d have to be an anemic, albino, malformed runt of a wolfhound. He is known to accompany Obelix on raids and bite Romans.

• Schnauzer: The last hypothesis: Dogmatix could be a schnauzer. This is unlikely for so many reasons: (1) schnauzers didn’t exist in 50BC; (2) they aren’t native to Gaul/France; (3) they look nothing like Dogmatix except, except, except that schnauzers have whiskers that perhaps vaguely resemble Dogmatix’s unusual moustache.

Which breed does Dogmatix belong to…?

Our goal is to determine which of these four breeds Dogmatix belongs to. First, we need some data, then we need a method for decision making.

# Machine learning literary genres from 19th century seafaring, horror and western novels

So I got interested in ‘digital humanities’ and I wanted to try out some of their procedures for doing literary criticism with computers. One basic problem they have is how to classify texts so that, firstly, they can be searched for and found (this is the librarian’s problem), but also so that they can be grouped according to content or genre and compared with one another (a literary critic’s job). And I realised that the internet is itself a vast text classification and retrieval problem on a scale comparable to the Borgesian infinite library. The Dewey decimal system is simply not going to cut it. I learned that even as I submit these words to this blog, search engine webcrawlers are sucking them all up and throwing them into vast term-document matrices, which they use to match my words to search phrases by calculating cosine similarities and other mathematical measures of correspondence.

How to do things with words with machines 1. Awesome geek stuff! The problem I’ve made up for today: train a machine to classify a book into one of three literary genres: Seafaring, Gothic Horror or Western.