Category Theory: Motivation and Foundations
Point of post: This post is to give the motivation behind category theory. Why someone would want to study it, glibly (and this is on purpose) what category theory is, and a perfunctory look at the old set theory assumed and the new stuff needed to study category theory.
Disclaimer: All that is said below is the viewpoint of a non-categorist, and is easily (I again easily) open for disagreement. This post is just the result of my own personal introspection, and does not claim to capture (fully or partially) any semblance of what category theory is. If you have any comments, please for my sake, let me know in a comment!
My topology course this term has necessitated the learning of category theory. So, I will post little semi-lessons as I myself read about the subject. These aren’t supposed to be comprehensive, just my ways of putting things in my own words.
Why do Category Theory?
This is the question all new mathematics faces: “What’s the point? Where can I use it? Is it even relevant?” Honestly, all legitimate questions, all especially applicable to something as general, as mind-bogglingly abstruse as category theory. So, what is the point?
The argument I, not being a categorist, would make for category theory being a useful subject is the same old song-and-dance given by lienar algebra, topology, and abstract analysis. Yes, while in the reality of most human beings lives the only vector space is the usual , the only topological space is with the usual topology, and the only analysis is taking derivatives of real polynomials, that isn’t all there is in the world. Whether or not a pragmatist would like to face it there is a theoretical need to study the vector space of differentiable functions (existence and uniqueness theorems in ODEs) , there is a need to study Riemannian manifolds (general relativity) , and there is a need to talk about measure theory (viz. Probability Theory, viz. Borel-Cantelli lemma). And with this diversity, comes commonalities. One studies abstract vector spaces because something like the Dimension Theorem not only applies to but to all the vector spaces modeled on it. One studies general topological spaces because paracompactness is not a specific concept, and one studies abstract analysis because most theorems of convergence apply in arbitrary metric spaces. Thus, by treating subjects holistically “We study topology” instead of specifically “We study the topology of the real line with the usual order topology” we are able to apply theorems in the former to specific cases like the latter, but moreover move seamlessly from specific case to specific case with most of our theory intact. In essence, why prove something forty times for forty specific examples when you can prove it once in general, and merely apply it to each of the forty specific examples? So, it makes sense to study particular mathematical fields generally and save time and effort by taking advantage of the commonalities among specific examples.
The advantage to this abstraction is actually two-fold though. Not only do we save time and effort, but in most cases the theories become more clear, apparent, even obvious when viewed from a certain level of abstraction. This may seem contradictory to some: “How can something more abstract, more general be easier to understand?” Well, in general it isn’t. It’s not that it’s easier to “understand” it’s that when you look at things from a stripped-bare (only the information necessary to what you are really focusing) perspective the “route” of proof, or the underlying ideas are more forthcoming. One could analogize it in the following way. Suppose that I sent you on a scavenger hunt looking for an apple, but I let you choose one of two places for the scavenger hunt to take place. You can choose to conduct the scavenger hunt in/at a) a stone maze with unadorned walls and floors or b) the national “Red apple-shaped things convention”. There are clearly pros and cons to both (assuming you visualize them in the same way I do). Clearly b) has the pro that it’s not a maze, things are more easily laid out and understood. That said, in a vast room full of red apple-shaped objects you might be misled, the goal obscured. You could spend ten minutes walking towards “the apple” only to discover that it’s merely an antique ‘sculpture’ of an apple. Thus, while the ‘floor plan’ of the red apple-shaped convention might be easier to navigate, with so much ‘stuff’ the goal may be obscured. Conversely, clearly a) has the opposite advantages and disadvantages, namely, it’s ‘floor plan’ is difficult, hard to navigate, hard to understand. That said, when you see the red-on-grey appearance of the apple on the maze, you’ll know immediately that you’ve found it. There is nothing to confuse you, to lead you down the wrong path. In essence, it’s easy to find it because in the maze there isn’t much to look for.
This is completely analogous to the general case/specific example dichotomy. In the general case the material may be harder, more difficult to understand, but when you haven’t been given much information you don’t have much to work with. If all you know about compact subspaces of topological spaces is that every open cover has a finite subcover, you surely know where to start when confronted with a compact subspace problem. Conversely, the specific case example may be (and this is a gross generalization viz. Tychonoff’s Plank) easier to understand, but with so much ‘stuff’ the method of approach, or patterns may become obfuscated. Take the example before, you’re now trying to prove that the Cantor set is compact. So, what do you do? You could prove it’s closed and bounded and apply the Heine-Borel theorem. You could take an arbitrary open cover and produce a finite subcover. You could use the L.U.B. property of the real line somehow. You could show that every infinite subset has a limit point. You could show that every sequence in it has a subsequential limit. You could…I digress. When given a lot of information, it’ sometimes not so easy to ‘see what to do’, you’re misled with an overabundance of information. This is the problem. And the tenet of abstractification is that the ‘maze’ scenario is easier to deal with.
In fact (I’ve tricked you again), the use of abstractification is three-fold. This last part possible may be the most useful. With this bare minimal approach to subjects, not only are the proofs of theorems more clear, but so are new theorems themselves. In a more general setting the patterns among theorems, or examples may seem more clear. To those in statistics this can be thought of as eliminating “lurking variables”. When you’re in and you look at a class of sets that has property , it’s not so obvious as to which property they share in common is the causal one. But, in a more general sense most classes of sets have very few (once again a hyperbole) commonalities, and so it’s more clear to what the causal pattern might be. Thus, the abstraction can actually help to lead to more theorems, and so we can get a diagram like this:
One might start to wonder why I labeled this “Category Theory” now, considering I have said nary a word about it. Well, category theory in some naive sense may be thought of as the ultimate abstraction. Instead of looking at the commonalities among certain structures (what do all topological spaces have in common? what do all groups have in common?) it looks at the commonalities among the structures themselves (what commonalities to vector spaces, groups, topological spaces, metric spaces etc. have in common). Thus, in essence we’re able to take the above paragraphs and via category theory apply them to many, many different fields of mathematics
What is Category Theory?
Category theory, in a holistic informal sense, can be thought of as the study of the structures in mathematics. For example, consider the following list
where “S.P.M” stands for “Structure Preserving Maps”. As one can infer, all of these objects have a kind of structure and a map between two such objects which preserves this structure, and in a sense gives an idea of equality when there are such maps going both directions. Even more though, we note that the composition of any of these S.P.M.s is still a S.P.M.
This kind of commonality is what category theory is all about. Things like quotient structures, completions, S.P.M.s are the bread-and-butter of what a categorist hopes to do.
Set Theory Foundation
Unfortunately, when one begins to talk about category theory they run into some set-theoretic difficulties. Namely, things like talking about the “set of all sets”, which anyone familiar with Russel’s Paradox knows is a set-theoretic no-no. So, to really talk about category theory we first need to broaden our definitions of what and what is not a set. Namely, we need to make explicit what a set can and cannot do, and extend the definitions to collections which are impervious to the kind of aforementioned contradictions.
We begin by assuming, as almost all math subjects do, the naive “You know what a set is” definition of a set. We then assumes that it has certain properties
a) For any set and any property we have that there exists a set
b) For each set we can form the power set defined as the set of all subsets of
For any sets and we can form the following sets:
c) The set whose members are precisely
d) The ordered pair
e) The union of and ,
f) The intersection of and ,
h) The Cartesian product
i) The relative complement
i) The set
Also, for any family of sets we may form the sets
k) The image of the family, namely
l) The union of the family
m) The intersection of the family
n) The Cartesian product of the family
o) The disjoint union of the family
But, as we stated earlier the concept of a set isn’t going to be sufficient for our categorical purposes, we need some kind of collection which circumvents our Rusell’s Paradox problem. This collection is called a class. The properties of a class are
a) The members of a class are sets.
b) For every property one can form the class
Now, we can then with this definition form the class of all sets, called the universe, and denoted by . Since it’s clear that we can perform ordinary operations on classes (union, product, intersection, etc.) we can make sense of things like functions. Thus, you’ll notice that we sidestepped the definition of a family of sets above, we can now give it. A family of sets is a function
Where the function is called an indexing function. If is a set, then is said to be set indexed.
Now, we also require that
c) If are classes then so is the -tuple .
d) Every set is a class, which is the same thing as saying the elements of every set are themselves sets.
Note though that this is not a two-way street. Every set is a class, but not every class is a set. There are things called proper sets which are classes that cannot be members of any set. This is how we avoid Rusell’s paradox and speak of things like “The class of all topological spaces” or “The class of all sets”, etc. With this definition we have the following theorem
Theorem: Let be a proper class and a set. Then, there exists no surjection
Proof: Suppose so, then we can form the family of sets and so by k) we may form the image set . But, this is evidently , which contradicts that is not a set.
Thus, we now have all the set-theoretic machinery needed to start category theory
1. Adámek, Jiří, Horst Herrlich, and George E. Strecker. “Foundations.” Abstract and Concrete Categories: the Joy of Cats. New York: Wiley, 1990. Print.