Clustering Techniques for Anime Hair Extraction

Waifu is Laifu

Krieger is by far my favourite character in Archer. Undoubtedly a satire on “Q” (the gadget wizard from James bond) he builds all the fun toys for Archer, Lana and the gang to go about using on their missions. However, unlike Q, Krieger has many quirks that make him special such as loving partner Mitsuko Miyazumi which happens to be a hologram waifu.

For those who don’t know, waifu is a term used to denote an imaginary anime partner — generally in the shape of a body pillow with which one forms romantic attachment too. While not every anime watcher has a waifu, the anime community still remains extremely dedicated to assigning and cataloguing personas to anime characters. This was pointed out to me by  Jerry Li (a future contributor to this blog), when he should me which has some extremely detailed profiles including but not limited to: height, weight, birthdays, occupations, school, classes, favourite foods, likes, dislikes, and a picture for each. However, I didn’t see a lot of documentation on hair colour/style, most likely because it is so immediately apparent from the picture.

Thus, I set out to create a semi-supervised algorithm to extract the regions of hair in a given anime portrait using clustering techniques.

The clustering techniques used were DBSCAN and k-means (plus convex hulls if you count those I guess?) and pre-trained facial detection algorithms. This article is meant predominantly as a Python tutorial to see what one can do when fooling around with scikit-learn, NumPy and OpenCV.

Summary of Approach


In plain English the logic of this program is the following

  1. Reduce noise in image
  2. Find a face
  3. Look where there is probably hair
  4. Look for other pixels with colours like your supposed hair region

The first step reduces the original image of many colours, shades and hues to 12 base colours which approximate the original image. This will be obtained by a clustering process known as k-means. The next step than searches uses obtainable information on the general location of the face to create a region defined to be the face composed mainly of skin. This region is then extended to look above some hypothetical scalp-line to guess the most likely region where one can find hair in the image.


Before beginning it is useful to think about what limitations this approach would cause. A few come to mind:

  • Determining the perfect k-value is expensive but extremely important too many colours and they will be hard to group, too few colours and everything will be over grouped.
  • Other objects in the picture may have similar colour to hair and may be incorrectly identified as hair.
  • If no face is detected the approach will not work.

Indeed we will find cases where all of these happen but using a naive algorithm as a first step towards solving this problem allows us to profile the difficulty of the project. A human can run this program on various portraits and select which ones work/fail gaining valuable insight before moving on to more complicated ML methods. If we can live with that, we can now get down to the nitty gritties of implementing the approach.

Writing the Code

All of this is put up here on my GitHub in the form of a notebook should you wish to run it alongside the comments and discussion on the webpage,

This tutorial focuses only on the clustering/data manipulation aspect of this problem it does not include anything on the scraping or pre-processing facial extraction. If that part interests you it can be found on the GitHub repo. For now it should suffice to simply note that there is a folder called faces  which contains two subfolders pngs and jsons where pngs contains anime face shots and jsons contains corresponding json files with facial region information. With that noted, we get started by shotgunning all the modules we need at the start.

The most confusing of these packages to install is cv2 which for the longest time had eluded me, until I found out that when installing with Anaconda it is hosted under the name as opencv even though it is imported as cv2. Hopefully this small addendum can save a mild amount of pain for others later.

With the packages imported we then read in all the files and for the purpose of this example we will only deal with one filepath f leading to an image location.

Colour Reduction via K-Means Clustering

Colour approximation via k-means clustering.As mentioned earlier image reduction is performed using k-means. This is a fairly standard procedure and even an example in the scikit docs. For a brief primer on k-means I refer the reader to this succinct youtube video which explains it with an example. The following block reads in an image and storing it in a dictionary as the original representation of the image. The image is then passed to a function which reshapes the data from the numpy array representation of {y-location, x-location, rgb value}  of shape (168, 100, 3)  into an array of RGB values. These RGB values are shuffled and passed to the scikit k-means wrapper, which tries it’s best to reduce an image into the requested k=12 colours. The output passes the reduced RGB data reshaped back into the original image dimensions. A side by side comparison is shown to the right to visualize the approximation.

The approximation looks surprisingly good — it’s hard to believe it is formed of only k=12 colours! As a visual aid to the reader, the following block decomposes the image into the multiple colours which now represent the image.

Split apart the process seems pretty remarkable. It is clear that some of these regions are distinctly part of the hair giving our approach hope. The next step is to collect the regions using an educated guess of where the hair is most likely to be.

Face Boundary Extraction via OpenCV & Convex Hull

Extraction of skin region from colours obtained from k-means clustering.

The facial properties obtained in the preprocessing portion of this project (as mentioned before the code will be provided at the end) return 4 values defining a bounding rectangle on the face given by:

  • (x, y) the location of the upper right corner of the rectangle.
  • (w, h) the dimensions of the rectangle.

The goal than is to find the outline of the face within the box. The first step of which is identifying the skin colour which we assume to be the dominant colour located within the box.

Core points obtained from DBSCN clustering algorithm.While we were able to successfully extract the skin — it is riddled with sparse flecks and outlines. This noise results from the edges colours acting as dislocated shades/accents rather than a representation of the colour region itself. To remove these we use a technique called DBSCAN:

  • (D)ensity
  • (B)ased
  • (S)patial
  • (C)lustering of
  • (A)pplications with
  • (N)oise.

The idea behind DBSCAN is remarkably intuitive. The algorithm is as follows

  1. Assign each pixel in our image is assigned a positional value (their position in the NumPy array)
  2. For each point look at a surrounding neighbourhood governed by a circle of size  eps
  3. Check how many pixels exist in this neighbourhood and introduce a min_samps cutoff to obtain a desirable minimum density of the neighbourhood

The output of running DBSCAN on the image is shown to the left by setting SKIN_DEMONSTRATION=True for demonstration purposes. After which it is set to False.

Convexhull of face obtained from DBSCAN.

Any clusters which remain after DBSCAN are then checked for intersection with the position of the facial region. In this way we pick up the face and neck or hands positioned over faces as part of the approximate region of face.

Next we seek the convex hull — the polygon with minimal perimeter which can contain every single point in the cluster. The region of the polygon is then filled with ones so as to create a solid mask of the face in the image.

Eigenvector being stretched by eigen value

With a defined region of hair we now need to find the scalp-line where the hair is most likely to be positioned. The following script performs this task by creating images  1.1 and 1.3 larger than the original size. Note the position of an image being scaled lies along a diagonal of the vector defining the original position, as shown right. Using this we should be able to devise a method to center the new images with enlarged masks around the face of the old image given by its  COM (center of mass).

The COM of the face mask for our case is trivially the average of all (x, y) values for each pixel in the mask.  Using this, we recenter our images while placing a bias on re-scaling the y term, so as to exclude the area beneath the original COM which would have included the neck.

Finally, with this sample of hair we search the rest of the image for other pixels containing colours from the scalp-line. The process is demonstrated on the right. Leftmost is the scalp-line region as obtained above. The next image shows the all clustered colours which contribute to more than 5%  of the total area in the scalp-line. The penultimate image restores the clustered colours to their original RGB values. The final image, serves as a reminder of what we were extracting from.

A Data Oriented View

One might be tempted to ask “Was that really so hard? Do we really expect all those problems at the beginging to occur?”. An appreciation for the nuances can be gained from viewing the raw RGB data. Below all RGB values are plotted in HSV space, with a slight opacity added to quantify density. The leftmost image is the full 3D space plot while the remaining images are 2D projections along each of the HSV axes. In row order, they correspond to

  1. The original colour distribution of the original image
  2. The final colour distribution of the image obtained from the predicted hair region
  3. The original colour distribution after k-means was performed
  4. The final colour distribution of the k-means cluster describing the predicted hair region

The points at the origin (0, 0, 0) in each are simply the non-existant values and can be ignored from the plots.

HSV plots of colour distribution

We see then that the k-means clusters help to create a more organic boundary per se to separate via a non-linear decision boundary. This is extremely useful if not necessary, given that the shades often change as the hair falls downward. In the next part we will see cases where this is still not robust enough to recognize glares/shininess of certain hair regions.

Viewing the Results

As mentioned before this technique is not perfect — it is merely meant to provide a starting point to pick out well-behaved images to form an intuition with which to work with on more complicated models after. Below I have provided some examples sorted into 3 categories loosely assigned while sifting through some of the results. In order of appearance they represent the:

  1. Good
  2. Comme ci comme ça
  3. Just plain ugly

I plan to extend the analysis later to account for these discovered issues. However, to limit the scope of this writeup this is where the first part ends. The next part of this series will focus on applying ML techniques on top of what we have done instead of simply filtering our results. So stay tuned for part 2!!!

Hair clustering results bad(ish)

Hair clustering results ugly


I hope this project can serve as an example of the potential of OpenCV, NumPy and scikit-learn to those interested. While far from perfect, in certain well seperable cases it seems to work well, while more complex palettes cause failure. The next post will look into ML techniques to deal with this breakdown while improving upon the model.

If you enjoyed this post please consider

  • Sharing the link with others
  • Subscribing Facebook/Twitter for fun articles/updates
  • Leaving a comment below (I love feedback!)

Thanks for reading and I hope you enjoyed it as much as James Franco enjoys his waifus.

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.