Manifolds in Data Science — A Brief Overview

11 Jul.,2023

 

What is this thing?

Data science requires an insightful understanding of data. As more and more data accumulates, it becomes harder to answer the following question:

How do I spatially represent my data in an accurate and meaningful way?

I claim that a super useful step in answering this question is understanding what a manifold is. Here’s the good news: It’s very likely you already understand what a manifold is. Manifolds are visual by nature, so everyday examples are abundant.

In this article I will:

  1. Explain what a manifold is and give a conceptual definition.
  2. Visualize examples of manifolds in various contexts.
  3. Show how manifolds are used in data science.

What is a Manifold?

Manifolds describe a vast number of geometric surfaces. To be a manifold, there’s one important rule that needs to be satisfied. The best way to understand this property is through example. Manifolds exist in any dimension, but for the sake of simplicity, let’s think about a three-dimensional space.

Suppose there is a small ant walking along a manifold in three dimensions. This manifold could be curvy, twisty, or even have holes in it. Now here’s the rule: From the point of view of the ant, everywhere it walks should look like a flat plane.

Does this rule sound familiar? If you’re looking for an application I think this is one that all of us can relate to; we live on a manifold! A sphere is one of the simplest examples of a manifold in three dimensions.

Examples of Manifolds

Here are some common examples of manifolds. Note that the manifold is only the surface of these objects and not the interior.

Basic surfaces that are manifolds.

Can you think of surfaces that are not manifolds? These surfaces will have problems at some “sharp” points. Below are the first few that come to mind for me:

  1. A cube. If you walk along a side and get to an edge, things will be too sharp and will no longer look like a plane.
  2. The landscape of a mountain. Assuming that the peak is perfectly sharp, at this point things will not look like a plane.
  3. An hourglass. Assuming that the intersection of the two halves is a single point, the rule will be broken here.

Intuition for thinking about manifolds

The common theme of these examples is that they are somewhat smooth — meaning that there are no sharp spikes or edges. The overall shape of the object can be amorphous, which is nice when describing datasets that don’t have rigid boundaries.

Manifolds in Data Science

Data can come from a variety of spaces. It can be the space of all images, or from a range of prices and numerical values. These high dimensional spaces have complex representations that cannot always be visualized. However, the data may come from a special subset that is represented by a manifold.

Thus, manifolds can act as a stepping stone from a complex space to a simpler, smoother subset.

Manifold of handwritten digits as a two-dimensional representation.

Classification problems are prime examples for manifold learning — where we are specifically looking for manifolds that separate two types of data.

Classification problems involve finding manifolds.

Other times we may be interested in “unraveling” the data to a lower dimension — consider sampling from a spiral-shaped manifold and learning how to unwrap it from the three-dimensional representation below to a two-dimensional planar representation.

(a) Dataset spatial representation. (b) Smooth surface approximation.

Defining a Manifold

Unfortunately, manifolds are generally not easy to define analytically, as is the case with most geometric objects. Many tasks in machine learning are concerned with learning manifold representations for data, and then utilizing this representation to make predictions about the remaining space. If you are interested in this branch of research in machine learning, look into manifold learning.

Conclusion

Looking at data is extremely satisfying for many people, and understanding the geometric structure of data comes along with this. Manifolds are the fundamental surfaces that data is found on. Once you have a manifold to describe your data, you can make predictions about the remaining space.

Thanks for reading! It would be great to hear your feedback on this article and what you would want to see in the future! Feel free to write any questions in the comments below.

If you have any questions on custom manifolds. We will give the professional answers to your questions.