Vineyard yield estimation with smartphone imaging and AI

vine shoot and cluster — **Vine shoot and flower buds**

There is so much potential in those tightly closed flower buds. Over the course of the summer the flowers on vines bloom, turn into tiny green spheres and ultimately heavy bunches of grapes. Or at least that is the hope of the vineyard owner and winery. Accurately estimating the size of the harvest well in advance has a number of advantages. Early yield estimation allows the right number of pickers to be hired at a reasonable rate and the right amount of tank space, bottling and packaging.

Yield estimation by manual visual inspection is the method recommended by the Grape and Wine Research and Development Corporation (GWRDC, Australian Government). In a 2010 guidance sheet, Professor Gregory Dunn (University of Melbourne) recommends randomly counting grape clusters across entire vineyard parcels. There is good correlation between the number of grape clusters per vine and the ultimate yield.

**Correlation between vineyard production and grape cluster count (courtesy of GWRDC)**

It is not always easy to make an accurate random sample count of bunches and previous yields can vary from year to year. Counting in vineyards in cool climates like the UK has both these difficulties because seasons tend to be more variable than further south and more vigorous vines tend to be planted. Vigorous vines like Reichensteiner produce thick leaf canopies that obscure developing fruit.

Researchers at Cornell University recently reported a novel, cheap and effective method of early yield estimation based on smart phone video footage of a whole vineyard and artificial intelligence (AI) analysis of the recorded images.

smartphone_ATV grape vine imaging — **All terrain vehicle with smartphone on gimbal and LED lighting panels (courtesy of Frontiers in Agronomy and Cornell University)**

Stereo-imaging and LIDAR measuring devices have been around for a while now but they are expensive, think £’000 to £’0,000 to equip a vineyard with a system. The Cornell system is essentially a smartphone on a gimbal with a lighting boom that can be driven or walked up and down the rows of a vineyard at night.

In addition to being a low cost solution, it is also effective. They report a cluster count error rate of only 4.9%, almost half that of traditional manual cluster counting. Improved cluster counting and therefore better yield estimation is obtained mainly due to better random sampling of vineyard and better identification of clusters. Over two growing seasons the Cornell team found that early video imaging gave the best results because small clusters and shoots were not obscured by large leaves and a dense canopy.

So how did they turn a rather long video into an accurate and precise cluster count?

Firstly the different objects (leafs, shoots, clusters, posts etc) in the video needed to be classified. Building the classifier is the major task in a machine learning implementation. There are a number of Open Source tools readily available to do this. They chose a Convolutional Neural Network (CNN) to identify objects in the video images. CNNs apply digital filters to simplify and exaggerate whole images, making them look more round, jaggy, linear etc. These are applied to the image under test to find the combination that finds a result that fits a set of defined examples of a particular object. But how does the CNN know what an object is? who defines the objects? The Cornell answer is student interns. They were given sets of training images and a copy of Open Source Python app LabelImg and tasked with drawing boxes around each object of interest and giving them the label ‘cluster’. The other useful source of information to train the CNN was the so-called Microsoft COCO (Common Objects in COntext) dataset. COCO is essentially a large set of sorted images that are not grape clusters. The image below shows how clusters are identified from video footage.

vine clusters located by CNN — **Vine clusters identified by trained CNN (courtesy of Frontiers in Agronomy and Cornell University)**

TensorFlow, a user-friendly Open Source platform was used to train the neural network and apply it to the video footage.

It would be fascinating to apply this cost-effective and early yield technology in the UK, where the climate is warming but seasons are still variable.