Internship report: Solving a game of QWIRKLE with Custom Vision & OpenCV

In this blogpost, we’ll explain how Evi Leroy, an Intern from VIVES Hogeschool, attempted (and succeeded) to use photos and Custom Vision and OpenCV to determine the best move for the game of “Qwirkle”. It is the result of
a 50-day internship and was the final project to obtain a bachelor in applied informatics.

An introduction to Qwirkle

Qwirkle is a strategy board game released in 2006 by MindWare. The game is similar to Scrabble and Rummikub and can be played with two to four players. It is made of black blocks, on which a figure is depicted. These figures vary in shape and Color, there are six colors and six shapes. That way there are 36 different types of cubes
and of each of these, there are three in the game, for a total of 108 cubes.

Each player draws six blocks blindly and replenishes them to six throughout the game. During the game, each player takes turns placing cubes on the table. Some rules apply here:

the placed cubes must always adjoin at least one tile on the
table and the blocks must always form a series.
A series is formed with two or more blocks that adjoin each other and lie in a straight line.
This series may also only consist of blocks of the same shape or color, but can never contain two exactly identical blocks.

The player gets points for each series of which the placed blocks belong: one point per block in the series. If a player completes a run of six, he gets another six extra points. The game is over when all the cubes have been drawn and one player has placed all his blocks. The player with the most points wins.

Goal

The purpose of the internship was to investigate whether it is possible to use AI and Custom
Vision to determine the best move in a game. Using images of the playing field
and the cubes of one of the players, the application should make suggestions for the best move, to
eventually win the game.

How do we extract logic from the images that are taken? How can we convert relevant information from images
to useful data? This goes beyond mere recognition and image classification. The blocks must not only be recognized, there must also be intelligence to determine the relationship between these blocks. How do these relate to each other
and how is the game board structured logically? These were only a few of the research questions that sparked the project.

The choice for a board game such as Qwirkle was eventually only a way to set up an interesting demoproject around these research questions. The board game is not so much an end in itself, but a means to explore the possibilities and challenges of Computer Vision.

Methodology

The project can roughly be divided into three parts: the preliminary investigation, the
training phase and the realization phase.

The first part introduces the methodologies and technologies: Custom
Vision, Python, AI, computer vision and Azure. The project is analyzed and
we performed research for similar projects (e.g. conversion of game board to digital versions,
recognition of grids, sudoku and chessboards) and possible methods and approaches.

In the second part, photos are taken, processed and tagged. It is examined which
photos and tags are needed. The models are trained and published in Custom Vision.
To calculate the best move, a well-trained neural network is required. Faults
in recognition inevitably lead to mistakes in the best move. It is also investigated
how OpenCV can be useful here.

In the third part of the project, the trained model is used in an application. Before the photos are analyzed in Custom Vision , they are placed in the correct position using OpenCV. The perspective of the photos is changed so that the lines of the game board run parallel to the edges of the photo. After this, the Custom Vision API can be addressed. Using the results returned, the game board becomes digitally drawn, as well as the player’s cubes. The coordinates of the bounding boxes are crucial here. The best move is lastly calculated using a brute force algorithm.

Design

A classic approach to gameboard recognition is to create a grid through feature extraction and splitting it into individual cells (segmentation) (Underwood, 2020). Situations where the game board is square or rectangular, with clear lines between the different cells and a fixed size, lend themselves very well to this. Examples are e.g. chessboards and sudokus (Sevgen, Arslan, & Samli, 2015). The computer vision model is only trained on individual cells, classification is applied to each cell separately. For sudokus, the model is e.g. trained to recognize a handwritten number, at chess on recognizing the specific chess pieces. OpenCV even has a specific function for finding the corners of a chessboard (findChessboardCorners()).

This approach is not possible in this project for two reasons. Firstly, the game board of
Qwirkle is not rectangular. It’s tricky, but not impossible, to determine the farthest corners of the
game board. However, the game board also has a variable size, in contrast
to a chessboard, which is always eight squares wide, and a sudoku grid, which is always nine
squares wide. As a result, it is not possible, once the angles have been determined, to determine the width to
divide the game board by a fixed number to arrive at the width of a cell. In addition, due to the black color, the edges of the blocks are difficult to recognize in the photos.

Determining the separate cells via feature extraction quickly becomes difficult. A different method will therefore be needed for this project. We actually used an object detection project in Custom Vision and using the coordinates of the
bounding boxes of the predictions, the grid of the game board was determined. For this to work, it is important that the game board is “straight”, meaning the placed cubes are parallel to the edges of the photo. If this is not the case, the determination of the grid will be much more difficult. Also, the distinction between the diamonds and the squares on the blocks is not recognized well by Custom Vision, so we needed to look into a solution for that as well. Every photo will therefore always be stored in a folder, and with the help of OpenCV, correctly positionned before being forwarded to Custom Vision. To keep things simple, the photos in the training set were placed in the
correct position by hand. Next, the shape and color of all blocks are determined and the game board is digitized
(in a 2D array). After that, the blocks of the player are recognized in a separate photo and the best move is determined using a brute force algorithm. With this algorithm, we simply go through all possible combinations and the algorithm selects the move that yields the most points.

Training

Because Custom Vision uses transfer learning, it is possible to train a high-performance model with only a few photos. To avoid overfitting, a good training set does have a number of requirements. For example, the data set must be sufficiently large. Custom Vision recommends providing at least 50 photos per label. With fewer photos there is a bigger chance of overfitting. The data set must also be balanced, i.e. there must be approximately the same number of
photos per label. If not, some labels will be recognized better than others. Custom Vision specifies that there must be at least a 1:2 ratio between the label with the fewest photos and the label with the most photos. A third important parameter is variation in the photos. Think of variation in background, lighting, size of the object, camera angle and different types of the object (e.g. not always the same bike, but different brands and models if you want Custom Vision to recognize a bicycle) (Microsoft, 2021).

Especially the first three of these possible variations are important for this project. By the transformations that will be performed on the photos using OpenCV, it isn’t required to include camera angle as a variable in the training set photos. Initially, Custom Vision was used in our project where there were no variations within the labels. Each label then corresponded to a block with one specific color and one certain shape. The blocks within one label are always identical. Later on in the project however, we switched towards an approach where only the shape of the block was determined using Custom Vision, meaning the color could vary. Then it was important that all variants (colors) were within one label in the dataset (e.g. not only yellow circles to recognize the circle shape, but also red, blue, …).

After getting a trained model to match the colors of the cubes with a circle shape, this model is copied to expand with blocks depicting a clover. Hereby enough pictures with clover cubes must be added to avoid creating an unbalanced
set that would lead to poor results. A ratio of 75 photos per clover-tag vs. 157 photos per circle-tag was considered unbalanced by Custom Vision for example, as the 1:2 ratio discussed above was not met. With a ratio of 118 to 157 photos, we eventually created a balanced dataset.

Although these results were not bad, they turned out to be largely insufficient for the rest of
project. Some blocks were recognized incorrectly, but there are also a lot of blocks that are still not recognized at all. Bad predictions inevitably lead to a bad calculation of the best move and in addition, the model also needed to be expanded with even more shapes, which would only complicate the predictions even more. It is clear that this approach did not work and we had to figure out a different way of recognizing the blocks correctly.

Using OpenCV for color recognition

A better method to determine the colors correctly is to determine the position (and shape) of the
blocks using Custom Vision, and using the middle pixels of the bounding boxes (that most certainly contains the colors), to predict the color. However, because the RGB values of these pixels are not ideal for segmenting colors, they are first transformed to HSV. It is important here that the centers of the bounding boxes are found, as many
may match the centers of the actual cubes. A color space is used to describe colors. One of the best known is the RGB color system. This is an additive system that is mainly used in electronics and where each color is expressed in terms of three primary colors (red, green and blue). Each parameter (primary color) is assigned a value from 0 to 255. A code (0,0,0) corresponds to black, (255, 255, 255) to white.

In OpenCV the standard for color recognition is BGR , which is the same as RGB, but where the red and blue channels are switched (OpenCV, sd). A color model that corresponds better with how we see colors as humans and that is also better suitable for segmenting colors is HSV. This color model is also sometimes called HSB. Here H stands for hue (the hue, which we usually call the color), S for saturation (saturation or brightness, the amount of gray present in the color) and V for value or B for brightness (intensity or lightness, from black to white). The advantage of this color model is that the color and exposure information are completely separated (Gad, 2018).

The hue is expressed in degrees on a circle, from 0 to 360. Where 0 corresponds to pure red, 120 with pure green, 240 with pure blue and 360 back with pure red. In OpenCV, this is divided by 2 for 8-bit images (OpenCV, sd). Based on these values, the color of the blocks can be correctly determined, regardless of the circumstances in which the photo was taken (with or without flash, exposure, …). Since the middle of each block in all shapes has the desired color, the center of the bounding box from Custom Vision used for this. There is also sufficient margin, since all fill the center of the cubes well. No wrong color determination was noticed with this approach anymore.

Shape determination with Custom Vision

Determining the color by selecting the middle pixel in the bounding box greatly simplifies Custom Vision’s classification task, as we can now limit ourselves to six classes: one per shape, regardless of color. With only 53 photos per tag, we already got back very good results, which are reliable enough to calculate the best move. It remains important that an approximately equal amount of blocks are used per tag (shape) in the six different colors. The photos should also vary in lighting and background.

The model was further finetuned including training samples with pictures taking from a variable distance of the playing field.

With only limited additional training we were able to improve the model, as seen in following comparison between the results from the model not trained for variable distance (left) and the model trained for variable distance (right).

This final model was published in Custom Vision, so that in a next phase, using an API, it could be invoked from another application. For this project, the game board is first put in the right position in a preprocessing phase, before it is sent to Custom Vision. The results returned by the prediction API are then used to create the digital game board and determine the best move.

Preprocessing the images

Positioning the photo correctly is done using feature detection in OpenCV and is crucial for the
conversion to the digital game board. Because the coordinates of the game board are calculated using the bounding boxes of the Custom Vision predictions, it is important to use pictures on which the game board is straight (lines of the
game board are roughly parallel to the edges of the photo) and taken from above. In Custom Vision, the bounding boxes are always parallel to the boundaries of the photo. With photos where the game board is not straight, this also leads to incorrect recognition of the squares and the diamonds on the cubes, which of course causes an incorrect determination of the best next move. When the straight lines of the game board are detected, it is possible
to rotate the game board or change the perspective. Then the photo can be sent to Custom Vision.

We experimented with different techniques in the preprocessing of this project: conversion to grayscale, adding Blur (average blurring, Gaussian Blur, MedianBlur, BilateralFilter), Tresholding (for image segmentation), (Canny) edge detection, Hough line detection (detecting straight lines in the images), contour detection, …

Adaptive threshold & Canny edge detection

In a next phase, it will be examined how photos that were not taken from above, can be transformed into a photograph in the correct perspective. Hereby the original photo is reconverted to grayscale and a threshold, Gaussian blur and Canny edge detection is applied.

The findContours() function finds the contours in the photo. The tuple returned by this function varies in structure according to the OpenCV version which is used. Therefore grab_contours() is needed to select the correct array. The contour with the longest arc length is the contour of the game board.

The Hough lines are determined from this contour. From this, four lines are selected: two lines that will form horizontal lines of the game board and two lines that will form vertical lines. Together, these will form the game board. Both horizontal and vertical lines must be sufficiently different from each other.

Four points are needed to change the perspective. For this, the intersections of the four Hough lines are used. The lines are combined with each other in pairs. The corner between the lines in each pair is checked, making the combinations of both horizontal lines mutually and both vertical lines mutually are not used. These lines are not perfectly parallel, so they will eventually intersect. After that, the intersection of the remaining pairs that should be within the frame of the image is determined each time.

The 4 selected Hough lines and their intersections

In a next step, the perspective matrix is determined on the basis of the intersections. The destination points used are [0.0], [0.100], [100.0] and [100.100] and form a square. This is not the case at the intersections. As a result, this could cause distortion after the transformation. Often this is limited and does not pose a problem for the rest of the project.
The shapes on the blocks are still recognized correctly by Custom Vision.

Determining the best (winning) moves

The digital game board is an array and is filled with the results from the Custom Vision API. The position of a cube is calculated from the top left corner of the game board, the upper left corner of the bounding box and the dimensions of an average cell. We check whether the digital game board is sufficiently large and whether the cell is still empty. Afterwards the cell is filled with the recognized block. The predictions are returned by Custom Vision in order of descending probability. Sometimes the same block is recognized more than once, that’s why we use the result with the highest probability, because in our tests this has always been the correct one. The cell is therefore always filled with the prediction with the highest probability. Once all the results have been placed on the digital game board, an empty border is added. This is necessary to determine the positions where a new block can be placed and where the “next best move” should be calculated.

The player’s cubes are determined in the same way as the cubes in the game board. This uses the same Custom Vision project, but the location of the bounding boxes is not important here. However, these blocks must also be “straight” so that there are no errors in the recognition of the diamond- and square shapes. The class in which these blocks are located however must be iterable, unlike the game board. This because when determining the best move, the list of blocks and potential moves is run through several times. For every block, we look at which moves are possible, how many points they yield and which other blocks are left that can be paired with what the player has in his hand.

The best move is determined using a brute force algorithm. First, the positions where a block may be placed are determined. All cells of the game board are looked at and for each cell it is checked whether it is empty and borders to at least one block (horizontal or vertical).

For each cube of the player, every valid position is checked and it is checked whether this is a valid move. We first check whether this move forms a valid row or column. This is done by first picking up the row or column from the game board. By direction (horizontal or vertically) it checks for adjacent cubes and as long as there are, it is hypothetically added to that series. For each series, it is then checked whether the blocks either all are the same color or all have the same shape. When this is the case, it is also checked whether the other attribute (shape or color) each time does’nt occur no more than once in the series, to ensure the validity of the move.

When only one cube may be placed per turn, the code is relatively simple. The score is calculated for the row and/or column that is formed: per formed series, one point per block. This is split into a horizontal and vertical score. When a series consists of only one block (the constructed block), the score of this series is zero. When a series consists of six blocks, six extra points are added to the score. It becomes slightly more difficult if more than one block is allowed to be placed. When the first cube constitutes a valid move, it is placed on the game board and removed from the
cubes of the player. Then the adjacent positions of the placed cube are determined. These positions are run over for each remaining cube from the player’s cubes. It is again checked whether these constitute a valid move. If this is the case, it is checked whether the first and second block together form a row or a column. This has
namely influence on the scoring and on the possible positions for a third block. After that, it is checked whether more blocks can be added. Since the second block decides whether the constructed blocks form a row or a column, this can be done with a while loop (while move_possible). When the blocks form a row, only the horizontal adjacent positions. When the blocks form a column, only the vertically adjacent positions. When calculating the score, it is checked whether the new block forms an additional row or column with the game board that is already on the table. Also, it is
checked whether this forms a row or column consisting of six blocks, because that entitles you to an additional six points. Of all these possible moves, the sequence (which blocks are placed where) and the score are saved. These are all eventually run over for the move with the highest score.

Visualizing the best move

After determining the best move, it is visualized on the photo, along with the score. Also, all results from Custom Vision are displayed on the photo (with the changed perspective). We developed an app in Blazor to enable interaction from a user friendly interface.

Any questions about this project?
Feel free to reach out!