Search for:
Counting grapes with machine learning
bunches of grapes
Bunches of grapes in a full canopy of leaves and stems

As the grapes swell in the late summer sun and rain, vignerons start thinking about the harvest. What is the yield going to be? how much sugar will there be in the grapes and finally how many bottles of wine can be made?

It would be really useful to have a way to predict the yield of each vine, sector and vineyard. Reliability is important because securing the wrong amount of vineyard labour, vat capacity or bottling can lead to either higher costs or loss of production. Hand counting is reliable but extremely time-consuming if a good representative portion of the vineyard is counted by visual inspection. Wouldn’t it be great if you could automate the inspection process by mounting a camera on a drone or quad-bike and then use AI (artificial intelligence) to pick out the grape bunches from the images of leaves, stems and bunches as the camera races up and down the rows of vines?

Well, this has been done by a number of developers and Corbeau reviewed one clever solution recently. The big challenge is to overcome the confusion of bunches partially obscured by leaves and stems.

Machine learning (ML) examples often use images of salads, with bright red tomato slices, green lettuce and orange carrot sticks. These can be easily sorted or counted using colour filters. In the vineyard you encounter green leaves, grapes, stems and tendrils. Each of them growing at different angles and with a range of different shades of green. Counting grape bunches is a much more difficult problem to solve than counting tomatoes.

Undaunted, Corbeau decided it was time to investigate Open Source AI as a way to count grape bunches in vine images. There are a wide range of Open Source tools available, some of which require programming expertise and some which do not. The main drawback of using Open Source tools is that the documentation is typically either very limited or written in a strange English dialect spoken only by Microsoft employees.

Example of Microsoft English dialect

Corbeau started with a set of 30 colour vine images taken with a smartphone. Each image was taken in full sun and contained random combinations of grapes, leaves, stems and tendrils. Images typically covered a vine area of around 1 square metre. The first task was to create sets of annotations to identify features of interest. This was done manually with the Microsoft API, drawing labelling boxes around grape bunches and stems in the images. An example annotation is shown below.

Bunch label (yellow) and stem labels (green)

A CNN (convolutional neural network) was chosen to create a machine learning model. Convolution methods are useful ways to emphasise shapes in images. They work by trying to make the image look like a particular pattern, for example parallel vertical lines or circles. After convolving a reference labelled box with a pattern, the simplified new image is assessed: is it more ‘like’ the pattern than something else? When many different patterns are convolved with an image, a set of patterns will tend to characterise one image more than another one. Each of the pattern ‘like’-ness tests can be made a decision point in making an image classification. In this sense the decision-point network is an intelligent classifier. The CNN requires some ground truth, defined by human definitions of what different bunches of grapes look like. It becomes intelligent by learning from human definitions of different bunches and stems.

The set of convolutions which classify a bunch and a stem are different and require verification and testing before they can be used as a machine learning classification model. New vine images were taken to verify the performance of the classification model. Bunch classification was more successful than stem classification because a much higher proportion of bunches were annotated in the learning set of images than stems. After a few hours developing better classification models, the model produced a 0.77 confidence level for bunch identification and 0.12 confidence level for stem identification. When presented with similar images (size, resolution, range) the model should be capable of identifying grape bunches and counting them. The model was deployed as a TensorFlow classification task, ready for counting grape bunches in additional grapevine images.

To test how this ML approach could be used in the vineyard, a burst series of smartphone images were recorded as the phone was carried parallel to a test vine. Running the ML model produced a list of potential image boxes found by the classifier, shown below.

Smartphone burst image of vine with annotation labels

Two bunches were identified with 1.000 (right hand bunch) and 0.928 (top left bunch) confidence levels. The next image box (at the bottom left side of the picture above) was not a bunch and had a 0.019 confidence level.

These results are an encouraging start to the machine learning bunch counting project. If you would like to take part in the project, please get in touch with Pierre Graves by completing the Contact form!

Smarts or knowledge – which one wins at precision agriculture?
Wheat field in Hungary (courtesy Wikimedia Commons)

Imagine a competition to produce the highest yields of winter wheat between Sheldon Cooper and a winner of the Apprentice. Who would win? It’s tempting to choose the Big Bang brain-box but what if Lord Sugar’s apprentice had spent 10 years working on arable farms in the UK, Australia and the USA before joining the house of hopefuls?

Precision agriculture poses similar questions. Is it better to have the deepest understanding of plant biology, soil chemistry and metrology or the widest? Is it better to have the most detailed mathematical model of plant growth or the most robust?

These questions got an interesting airing in a recent paper by scientists at CSIRO (Commonwealth Scientific and Industrial Research Organisation) published in Field Crops Research journal. Andre Colaco and colleagues considered the question of how to optimise the harvest of winter wheat by supplying nitrogen. They compared the performance of detailed advisory models that used single field sensors with other less detailed models, that used multiple sensor inputs from the field. What does this mean? One advice system for example, might be based on measuring nitrate concentrations in crop leaves and include a whole set of equations describing how nitrate ions move from fertiliser pellets on the soil surface into the roots, up the stems and into the leaves. Another might also use sensor measurements of temperature, humidity, rainfall, wind speed and hours of sunshine; but use only basic assumptions about transport of macronutrients.

Which approach is more successful – the deep one or the wide one?

Wheat leaf (courtesy Wikimedia Commons)

Colaco and colleagues proposed on-farm experimentation and machine learning with multiple sensor inputs as a better way to apply artificial intelligence to crop management. They took 20 years of publicly available winter wheat data from Oklahoma State University (OSU) and used it to test different deep and wide approaches to advising how much nitrogen should be applied in mid-season to realise the potential yield of a crop of wheat. Four different approaches were tested, using half the historical test data as learning-sets and half as test-sets.


Their first approach was based on predicting a Yield Potential and then assessing the difference between the nitrogen content of that yield (i.e. kg of wheat per hectare) and the available nitrogen in the soil. The difference is the recommended nitrogen that must be applied to the field (in kg N per hectare). How were these two numbers calculated? Yield potential for wheat was measured by OSU using the so-called GreenSeeker sensor model. GreenSeeker is a handheld multispectral device that measures the NDVI (normalised differential vegetation index) of field crops. By comparing the NDVI response of field samples against a look-up table, an in-season estimated yield was obtained. Basically the greener the field test-strip, the bigger the expected yield. Farmers have been using simpler metrics like crop height to predict yield in a similar fashion for some years.

On-farm experimentation data showing the Optimal Nitrogen Rate and the Optimal Nitrogen Recommended as the difference between the predicted yield of the current crop and the optimal yield (courtesy Field Crops Research Journal)
GreenSeeker multispectral sensor (courtesy Trimble Agriculture)

Nitrogen demand for such a yield was calculated using standard assumptions: nitrogen content of wheat is typically 2.4% by weight and the efficiency of nitrogen uptake is again typically 44% of that applied to a field. The nitrogen recommendation for Approach 1 was therefore equal to:

[(expected nitrogen in predicted yield) – (available nitrogen in the field) ] x uptake efficiency


The second approach was probably more appealing to chemists, using assumptions about the nitrogen response rate (i.e. the concentration of nitrogen multiplied by some rate constant driving the complex growth reaction) rather than a nitrogen mass-balance. With Approach 2, the NDVI values of wheat in different test strips were used directly as parameters for plant growth rate. The in-field experiments required mid-season measurements of wheat test strips with different levels of applied nitrogen at the start of the season and aimed to find the plateau NDVI value that corresponded to the maximum level of nitrogen that the plants could take up given the soil and climate conditions. NDVI measurements in this case were made with a Crop Circle sensor and converted to a recommended nitrogen application rate using a look-up table directly.

Crop Circle multispectral sensor (courtesy of Holland Scientific)


The third approach introduced Machine Learning (ML) to Approach 1. First a whole load of seasonal variables were introduced in to the yield prediction calculation as possible solutions to the variation that was seen in natural year-to-year variation in crop yield. The table below shows the type of variables taken into account.

Additional seasonal variables considered in Machine Learning (courtesy Field Crops Research Journal)

A simple regression analysis was made to identify the most influential seasonal variables. Next a Machine Learning method known as Random Forest (RF) was used to investigate various decision-tree models (combinations of the most influential seasonal variables) that could possibly lead to an applied nitrogen recommendation at mid-season. There are some useful video links at the bottom of this Insight article that explain decision-trees and RF. It turned out that the most influential variables for Approach 3 were: NDVI, RI (response index), soil moisture and rainfall. The Random Forest trees were derived using half of the historic OSU winter wheat data and refined so as to create a Machine Learning model that could be used to predict the recommended nitrogen application for the remaining 50% of the historic OSU data.

Selection of seasonal variables based on minimising the RootMeanSquareError (courtesy Field Crops Research Journal)


The forth and final approach was to apply Machine Learning to Approach 2 and produce a model Colaco called Data Driven. For their Data Driven approach, all the available sensor data was added to Approach 2 so that a very wide range of information was used to find the most influential seasonal variables. This time all 12 of the variables in the above table were used for Approach 4. Again the Machine Learning Random Forest method was used to find a set of decision-trees that best represented the 50% learning set. This set of decision-trees was then used to predict the recommended nitrogen application for the remaining 50% of the historic OSU winter wheat data.

So after all this modelling and number crunching what was the result?

The performance of the four Approaches was evaluated by plotting the recommended nitrogen rate against the actual optimal nitrogen rate. An R2 value of 1.0 would give perfect goodness of fit and the RMSE root mean square error values were used as an indication of the accuracy of the Approach. Based on these criteria, Approach 4 is the clear winner.

  • Approach 1 R2 = 0.42 and RMSE = 31.3 kg N ha-1
  • Approach 2 R2 = 0.63 and RMSE = 21.9 kg N ha-1
  • Approach 3 R2 = 0.51 and RMSE = 26.0 kg N ha-1
  • Approach 4 R2 = 0.79 and RMSE = 16.5 kg N ha-1

Oklahoma State University winter wheat yields varied from 1 tonne per hectare to 7 tonnes per hectare and over the 20 years over all the fields in the database, the optimal mid-season nitrogen application based on actual yields varied between zero kg per hectare and 110 kg N per hectare. Reducing the error in nitrogen application from 31.3 to 16.5 kg N per hectare by using Approach 4 rather than Approach 1 is therefore a significant optimisation of nitrogen supplementation. This would be expected to result in lower costs (when Nrecommended is too high) and higher yields (when Nrecommended is too low).

Applying machine learning can improve the use of either direct or indirect parameters in precision agriculture. Using multiple variables that farmers encounter from year to year and from field to field can produce more robust advice, even when the variables are used directly, without knowing exactly how they affect the yield of a crop.

Smarts or knowledge? Precision agriculture gains from the use of both better understanding and knowledge. When machine learning is added to a method, on farm experiments and local variables like microclimate can produce the very best results.

Corbeau for one, can’t wait to see the results of applying this approach in UK vineyards as well as winter wheat in the USA.

  • Find the whole article from Colaco in Field Crops Research
  • Read a short related article on a similar study in Australia
  • Watch a related video featuring one of the CSIRO team
  • Watch a FUN explanation of Random Forest machine learning, yes really!
  • Subscribe to receive our own monthly precision viticulture pilot study newsletter
Vineyard yield estimation with smartphone imaging and AI
vine shoot and cluster
Vine shoot and flower buds

There is so much potential in those tightly closed flower buds. Over the course of the summer the flowers on vines bloom, turn into tiny green spheres and ultimately heavy bunches of grapes. Or at least that is the hope of the vineyard owner and winery. Accurately estimating the size of the harvest well in advance has a number of advantages. Early yield estimation allows the right number of pickers to be hired at a reasonable rate and the right amount of tank space, bottling and packaging.

Yield estimation by manual visual inspection is the method recommended by the Grape and Wine Research and Development Corporation (GWRDC, Australian Government). In a 2010 guidance sheet, Professor Gregory Dunn (University of Melbourne) recommends randomly counting grape clusters across entire vineyard parcels. There is good correlation between the number of grape clusters per vine and the ultimate yield.

Correlation between vineyard production and grape cluster count (courtesy of GWRDC)

It is not always easy to make an accurate random sample count of bunches and previous yields can vary from year to year. Counting in vineyards in cool climates like the UK has both these difficulties because seasons tend to be more variable than further south and more vigorous vines tend to be planted. Vigorous vines like Reichensteiner produce thick leaf canopies that obscure developing fruit.

Researchers at Cornell University recently reported a novel, cheap and effective method of early yield estimation based on smart phone video footage of a whole vineyard and artificial intelligence (AI) analysis of the recorded images.

smartphone_ATV grape vine imaging
All terrain vehicle with smartphone on gimbal and LED lighting panels (courtesy of Frontiers in Agronomy and Cornell University)

Stereo-imaging and LIDAR measuring devices have been around for a while now but they are expensive, think £’000 to £’0,000 to equip a vineyard with a system. The Cornell system is essentially a smartphone on a gimbal with a lighting boom that can be driven or walked up and down the rows of a vineyard at night.

In addition to being a low cost solution, it is also effective. They report a cluster count error rate of only 4.9%, almost half that of traditional manual cluster counting. Improved cluster counting and therefore better yield estimation is obtained mainly due to better random sampling of vineyard and better identification of clusters. Over two growing seasons the Cornell team found that early video imaging gave the best results because small clusters and shoots were not obscured by large leaves and a dense canopy.

So how did they turn a rather long video into an accurate and precise cluster count?

Firstly the different objects (leafs, shoots, clusters, posts etc) in the video needed to be classified. Building the classifier is the major task in a machine learning implementation. There are a number of Open Source tools readily available to do this. They chose a Convolutional Neural Network (CNN) to identify objects in the video images. CNNs apply digital filters to simplify and exaggerate whole images, making them look more round, jaggy, linear etc. These are applied to the image under test to find the combination that finds a result that fits a set of defined examples of a particular object. But how does the CNN know what an object is? who defines the objects? The Cornell answer is student interns. They were given sets of training images and a copy of Open Source Python app LabelImg and tasked with drawing boxes around each object of interest and giving them the label ‘cluster’. The other useful source of information to train the CNN was the so-called Microsoft COCO (Common Objects in COntext) dataset. COCO is essentially a large set of sorted images that are not grape clusters. The image below shows how clusters are identified from video footage.

vine clusters located by CNN
Vine clusters identified by trained CNN (courtesy of Frontiers in Agronomy and Cornell University)

TensorFlow, a user-friendly Open Source platform was used to train the neural network and apply it to the video footage.

It would be fascinating to apply this cost-effective and early yield technology in the UK, where the climate is warming but seasons are still variable.

Read the whole Frontiers in Agronomy paper here.

Watch a brilliant explanation of AI from Microsoft’s Laurence Moroney here.

Find out more about TensorFlow here.

Download LabelImg here.

Find out more about the COCO Project here.