You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag search

Back to all posts

I wanted to see how well the vector space model of documents I’ve been using for PCA works at classifying individual books. [Note at the outset: this post swings back from the technical stuff about halfway through, if you’re sick of the charts.] While at the genre level the separation looks pretty nice, some of my earlier experiments with PCA, as well as some of what I read in the Stanford Literature Lab’s Pamphlet One, made me suspect individual books would be sloppier. There are a couple different ways to ask this question. One is to just drop the books as individual points on top of the separated genres, so we can see how they fit into the established space. By the first two principal components, for example, we can make all the books  in LCC subclasses “BF” (psychology) blue, and use red for “QE” (Geology), overlaying them on a chart of the first two principal components like I’ve been using for the last two posts: