Counting grapes with machine learning
bunches of grapes
Bunches of grapes in a full canopy of leaves and stems

As the grapes swell in the late summer sun and rain, vignerons start thinking about the harvest. What is the yield going to be? how much sugar will there be in the grapes and finally how many bottles of wine can be made?

It would be really useful to have a way to predict the yield of each vine, sector and vineyard. Reliability is important because securing the wrong amount of vineyard labour, vat capacity or bottling can lead to either higher costs or loss of production. Hand counting is reliable but extremely time-consuming if a good representative portion of the vineyard is counted by visual inspection. Wouldn’t it be great if you could automate the inspection process by mounting a camera on a drone or quad-bike and then use AI (artificial intelligence) to pick out the grape bunches from the images of leaves, stems and bunches as the camera races up and down the rows of vines?

Well, this has been done by a number of developers and Corbeau reviewed one clever solution recently. The big challenge is to overcome the confusion of bunches partially obscured by leaves and stems.

Machine learning (ML) examples often use images of salads, with bright red tomato slices, green lettuce and orange carrot sticks. These can be easily sorted or counted using colour filters. In the vineyard you encounter green leaves, grapes, stems and tendrils. Each of them growing at different angles and with a range of different shades of green. Counting grape bunches is a much more difficult problem to solve than counting tomatoes.

Undaunted, Corbeau decided it was time to investigate Open Source AI as a way to count grape bunches in vine images. There are a wide range of Open Source tools available, some of which require programming expertise and some which do not. The main drawback of using Open Source tools is that the documentation is typically either very limited or written in a strange English dialect spoken only by Microsoft employees.

Example of Microsoft English dialect

Corbeau started with a set of 30 colour vine images taken with a smartphone. Each image was taken in full sun and contained random combinations of grapes, leaves, stems and tendrils. Images typically covered a vine area of around 1 square metre. The first task was to create sets of annotations to identify features of interest. This was done manually with the Microsoft API, drawing labelling boxes around grape bunches and stems in the images. An example annotation is shown below.

Bunch label (yellow) and stem labels (green)

A CNN (convolutional neural network) was chosen to create a machine learning model. Convolution methods are useful ways to emphasise shapes in images. They work by trying to make the image look like a particular pattern, for example parallel vertical lines or circles. After convolving a reference labelled box with a pattern, the simplified new image is assessed: is it more ‘like’ the pattern than something else? When many different patterns are convolved with an image, a set of patterns will tend to characterise one image more than another one. Each of the pattern ‘like’-ness tests can be made a decision point in making an image classification. In this sense the decision-point network is an intelligent classifier. The CNN requires some ground truth, defined by human definitions of what different bunches of grapes look like. It becomes intelligent by learning from human definitions of different bunches and stems.

The set of convolutions which classify a bunch and a stem are different and require verification and testing before they can be used as a machine learning classification model. New vine images were taken to verify the performance of the classification model. Bunch classification was more successful than stem classification because a much higher proportion of bunches were annotated in the learning set of images than stems. After a few hours developing better classification models, the model produced a 0.77 confidence level for bunch identification and 0.12 confidence level for stem identification. When presented with similar images (size, resolution, range) the model should be capable of identifying grape bunches and counting them. The model was deployed as a TensorFlow classification task, ready for counting grape bunches in additional grapevine images.

To test how this ML approach could be used in the vineyard, a burst series of smartphone images were recorded as the phone was carried parallel to a test vine. Running the ML model produced a list of potential image boxes found by the classifier, shown below.

Smartphone burst image of vine with annotation labels

Two bunches were identified with 1.000 (right hand bunch) and 0.928 (top left bunch) confidence levels. The next image box (at the bottom left side of the picture above) was not a bunch and had a 0.019 confidence level.

These results are an encouraging start to the machine learning bunch counting project. If you would like to take part in the project, please get in touch with Pierre Graves by completing the Contact form!