A COMPREHENSIVE REVIEW ON SUITABLE IMAGE PROCESSING AND MACHINE LEARNING TECHNIQUE FOR DISEASE IDENTIFICATION OF TOMATO AND POTATO PLANTS

. Agriculture plays a vital role in the Sri Lankan economy. Cultivation of crops like tomatoes and potatoes which is being used as a fruit and vegetable will contribute significantly to farmer’s earnings. However, tomato and potato crop faces numerous challenges, such as disease infection can significantly reduce the yield. Early identification of these diseases is crucial for implementing timely interventions and minimizing the potential damage. The current study aims to analyze existing methodologies and identify the most effective approaches for disease detection in tomato and potato crops. Image processing techniques enable the extraction of relevant features from digital images of infected plants, aiding in the identification of diseases accurately. Additionally, machine learning algorithms have proven to be valuable tools for analyzing complex datasets and distinguishing between healthy and diseased plants. The review explores various image processing techniques, including image segmentation, feature extraction


INTRODUCTION
Agriculture is vital to the Sri Lankan economy.Agriculture's long-term development is based on agricultural production, which is hampered by crop diseases.Among the different types of vegetables cultivated in Sri Lanka, tomatoes and potatoes are two of the main export crops and are regarded as commercial products with significant export potential (Suresh et al., 2021).There are some constraints to the cultivation of these crops caused by fungi, viruses, and bacteria.These diseases can cause significant crop damage and reduce crop productivity.
Before treating these diseases, they must be correctly identified in order to apply the best solution to protect vegetable yards from being destroyed.The most problematic issue in cultivation is correctly identifying diseases in their early stages.In Sri Lanka, the identification of diseases is performed by a human expert.The vegetable cultivators must inform the agricultural instructors about their problem during this process, and they will then provide advice based on the symptom explanation of the cultivator (Wijekoon et al., 2022).This procedure cannot be completely trusted.Agricultural instructors may visit the field in some cases, but this process may take several days, and since it is a manual recognition based on experience, it is time-consuming, labor-intensive, and error-prone.It is critical to make accurate identification in order to control pests and diseases effectively.
The survey was conducted to better understand farmers' problems in identifying the types of diseases infected on tomato and potato plants, as well as their requirements.Initially, the problems of the cultivators were identified by interviewing several cultivators and observing existing research.The survey was conducted among 100 vegetable cultivators throughout the country.Out of that, about 70% of the cultivators grow either tomatoes, potatoes, or both.To the question of how often vegetable crops get diseased, about 50% answered "monthly," whereas only 20% answered "yearly."Then they were questioned about the existing manual disease identification methods used by them and their accuracy.Almost every participant answered that the present method is time consuming and erroneous because it is done manually by a human expert.Then they were questioned about whether they would like to automate the system, and 80% of the participants answered positively.Cultivators were also polled about the diseases that most affect cultivation, and the results were mapped, indicating the most diseased areas and the least diseased areas of both tomato and potato plants, as shown in figure 1.Also, they were questioned on what features they would expect the system to have if it were automated, and some of their suggestions were disease identification, suggesting the solution according to the type of disease infected, accuracy of the result, and updating the results on a map so that every cultivator can be aware of the diseases that can be easily spread according to the area of cultivation and take necessary precautions accordingly.The identified main problems of the vegetable cultivators are mainly as lack of knowledge in identifying whether the plant is diseased or not, inadequate knowledge on the types of diseases that are infected on the leaves of vegetable plants, lack of knowledge on what should be done in order to protect the crops from diseases and the time consumed in manually identifying the type of disease infected on the vegetable crop.
According to the reviewed research papers related to tomato and potato diseases, there are over 20 tomato leaf diseases, some of which are Tomato Bacterial Spot, Tomato Early Blight, Tomato Late Blight, Tomato Leaf Mold, Tomato Septoria Leaf Spot, Tomato Two-Spotted Spider Mite, Tomato Target Spot, Tomato Mosaic Virus, and Tomato Yellow Leaf Curl Virus (Gadade and Kirange, 2020).Further, over 15 potato leaf diseases were identified through the reviewed research papers.Some of them were potato early blight, late blight, Septoria blight, anthracnose, and leaf blight (Liu et al., 2020).This review focuses on identifying the best machine learning and image processing methods for addressing these issues using an automated system that can identify tomato and potato leaf diseases and provide a solution based on the disease infected.

METHODOLOGY
The aim of this research is to find the most suitable technique of image processing and machine learning to provide a solution to the problem of identifying destructive diseases in vegetable crops and suggest a remedy according to the type of disease infected.This section describes the methods and approaches which were used in conducting the review and producing the final review paper in great detail.The main problem identification was done by conducting interviews, discussions, and distributing surveys to several local vegetable cultivators of tomato and potato crops about the existing methods, the drawbacks of the existing methods, and whether the cultivators have proper knowledge of the solution to be applied after identifying the types of diseases infected on vegetable plants and analyzing the problems they face in manually identifying the type of disease infected on the vegetables and the precautions to be taken to protect the crops.To come up with a suitable solution, a literature review was conducted, focusing on machine learning applications and the image-processing methods that were used in disease identification of vegetable plants.
About 100 suitable research papers were found, and after reading the abstract and introduction of each paper, the 50 most suitable papers were selected.After reading and analyzing all the selected papers, the 15 most suitable papers were selected, and each paper was thoroughly reviewed.When reviewing papers with related research, the most important factors that were analyzed were the accuracy, output, diseases monitored, machine learning techniques, and image processing methods used.Some of the diseases of potato plants that have been detected on the reviewed systems are potato early blight, late blight, Septoria blight, anthracnose, and leaf blight (Liu et al., 2020).The tomato diseases that had been detected in the reviewed existing systems were tomato bacterial spot, tomato early blight, tomato late blight, tomato leaf mold, tomato Septoria leaf spot, tomato two-spotted spider mite, tomato target spot, tomato mosaic virus, and tomato yellow leaf curl virus (Gadade and Kirange, 2020).Data on the machine learning and image processing methods used in identifying the types of diseases infected on vegetable plants were gathered after a review of the 15 research papers.Most importantly, the different algorithms and models used in each paper were properly analyzed.The procedure for taking inputs and outputs, as well as the accuracy of each machine learning algorithm, were then monitored (Figure 2).
Convolutional Neural Networks had an accuracy of 97% (Rahman et al., 2023), the Random Forest Classifier had an accuracy of 97% (Iqbal and Talukder, 2020), deep learning models such as VGG19 and Logistic Regression had an accuracy of 97.8% (Tiwari et al., 2020), one-dimensional convolutional neural networks had an accuracy of 97.72% (Liu et al., 2020), the Convolutional Neural Network Architecture Inception-v3 model provided an accuracy of 93.3% (Adhikari, 2023), and the Gabor Wavelet transform technique along with the Support Vector Machine with differences had an accuracy of 91% (Sholihati et al., 2020).
Similarly, when analyzing the image processing methods used in each paper, the main features of the leaves were extracted using features such as whitening of leaves, browning of leaves, enlarging of nodes, leaf curls, and different spiral designs of leaves.The use of adaptive thresholding for the segmentation of the image of disease-affected areas, which had a 92.2% accuracy; image acquisition; ROI adjustment (Qasrawi et al., 2021); feature extraction; and the YOLO and HSV techniques, which had a 99.7% accuracy in detecting the type of disease infected on the leaves, were some of the image processing techniques and their accuracies that were monitored (Adhikari, 2023).

RESULTS
The system was developed for identifying tomato leaf diseases used a segmentation-based approach for segmenting the infected areas of the leaf (Gadade and Kirange, 2020).This system has trained 3000 images of diseased tomato leaves.The process consists of five phases, which are: preprocessing, where the R, G, and B color components are separated and the removal of leaf noise takes place; next is segmentation, where the image in RGB is converted to the HSV layout.The next phase is feature extraction, where features are extracted using detection of LBP features, mean pixel color value, HAAR feature extraction, color moments, and the Pewit operator used for feature extraction of edges, feature detection of the histogram, detection of HOG features, and Gabor feature detection.The next phase is classification, where performance was evaluated using decision tree categorization, linear regression, SVM-based classification, Naive Bayesbased classification, and KNN-based classification.The last phase is the severity measurement of the disease, where the threshold image was obtained after converting the HSV image to grayscale.According to the results obtained, the SVM classifiers along with HOG feature extraction are the most suitable image processing methods to identify and categorize the various tomato diseases.
Liu and the team developed a system using the 1D-CNN algorithm to identify spots of various potato diseases.The labels of the measurement results are required to verify the validity of the algorithm (Liu et al., 2020).This paper uses a technique for calibrating statistical data for hectic measurements using fine measurement as a guide.The dataset consists of 126 images of rot diseases, bacterial leaf diseases, early blight, and combined diseases with changing degrees of severity.Nine datasets were utilized for training, and 117 tests were performed.Based on the analysis of the hyperspectral image, mainly 4 categories were classified: leaves, patches, and background.Disease clusters are the main research objects in all types.The experiment's 1D-convolutional neural network was built using the framework Tensor Flow.It primarily consists of three alternately stacked layers of convolution and pooling, as well as two completely connected layers.The following are the findings that were made: 1.It takes a very long time for the SVM model to detect a single set of data.In this paper, 1-dimensional convolutional neural networks take about 15 seconds, which significantly enhances the recognition speed.2. In comparison to the SVM classifier, the overall number of classifications that misclassified pixels in smaller areas of potato leaves with various diseases is lower, and the mean accuracy is continued to increase by 2%.Total accuracy improves by 2.1%.
Tiwari and the team developed a system to identify the potato leaf diseases using a pre-trained model, VGG19, to get the necessary features from the dataset (Tiwari et al., 2020).The dataset consisted of 1000 images for early blight, 1000 for late blight, and 152 for healthy leaves.The following activation functions were used: relu in the hidden layers and softmax in the output layers.Feature extraction was done using the pre-trained models, and those extracted features were sent as input to different classifiers such as neural networks, logistic regression, SVM, and KNN.After evaluating and analyzing the results from each of the above approaches, VGG19, in conjunction with the logistic regression model, provided 97.8% classification accuracy, which was the highest out of all other approaches.
Using a CNN model, Shijie and others (Shijie et al., 2017) created a system for identifying diseases on tomato leaves that was based on the VGG-16 model and the transfer learning technique.The dataset consisted of 7040 images, which were divided into 11 categories.Two algorithms were used in the system.The first was (VGG16 + SVM), which was used to extract features from the image and identify the tomato diseases on the leaves.The second technique was finetuning, which was done in order to build a classification algorithm based on the VGG-16 original model.According to the results, VGG16+SVM provided 100% training accuracy and 88% testing accuracy.whereas the fine-tuning algorithm provided a training accuracy of 98% and a testing accuracy of 89%.Transfer learning technology provided 89% of the classification accuracy.According to the analysis, the performance of the fine-tuning model is better and gives higher accuracy than the VGG-16+ SVM model.
An approach based on convolutional neural networks has been used in the system along with the sequential model algorithm.The system is used to detect potato diseases such as late blight and early blight (Asif et al., 2020).Around 3000 images of both diseased and healthy leaves were trained into the system.The images were categorized into two sections: 70% for training and 30% for testing.Initially, the dataset is augmented, and then the CNN model is used for both testing and training purposes.After the training and testing processes have been completed, the results will be displayed using the model.The CNN model classifies the leaves according to their various features, such as colors, spots, and shapes, and the activation functions are calculated.Keras was used in the development of the model.Two layers, named flat and dense layers, have been used along with ReLU for building up the model.Adam was used as an optimizer for training the model.The model had provided an accuracy of 97% based on the results.
The system had been designed to detect and identify leaf diseases in tomatoes, potatoes, and peppers.The dataset consists of 20636 images, which have been categorized into 15 classes, including 12 classes for diseased leaves and the remaining for healthy leaves (Jasim and Tuwaijari, 2020) .The diseases that are detected using the system are tomato mosaic, tomato yellow leaf curl, tomato target spot, tomato spidermites, tomato septional leaf spot, tomato leaf mold, tomato late blight, tomato early blight, potato late blight, potato early blight, and pepper bell bacterial spot viruses (Jasim and Tuwaijari, 2020).Deep learning and CNN are two machine learning techniques that are heavily used in the system.Initially, image acquisition is performed, and each of the images is saved in JPG form.The next step is image preprocessing, in which the images are resized to a certain resolution.The next step is CNN structure design, in which the architecture consists of an input layer, a convolution layer, a pooling layer, a nonlinear layer, a fully connected layer, a normalize layer, and the SoftMax layer.The next step is training, in which 70% of the data has been used in order to extract the features of leaves and distinguish them separately.The next step is testing where the trained data along with the extracted features are used in the network.The final step is the detection of plant leaf disease and obtaining the results.The system obtained accuracies of 98.29% and 98.029% for testing and training accordingly, and it had the highest levels of accuracy compared to all other systems reviewed by the author.
Qasrawi and others developed a system to identify tomato plant diseases based on 3000 high-to lowquality images taken with a smartphone the tomato diseasesdetected through the system are Alternaria solani, Botrytis cinerea, Panonychus citri, Phytophthora infestans, and Tuta absoluta (Qasrawi et al., 2021).Machine learning techniques such as ANN, decision trees, and support vector machines have been used and validated for recognizing and classifying diseases from the images in the dataset.For training the models, 70% of the dataset has been allocated for training and 30% for testing.Each model was assessed using accuracy, recall, and precision.According to the results that were obtained, the maximum recall was obtained by logistic regression approach and the neural model. of 68.9 and an average precision of 70.1.Furthermore, for neural networks, the AUC was 92.3% and the precision was 70.1%, whereas logistic regressionhad an AUC of 93.1% and a precision of 68.9%.
Iqbal and others developed a system to detect the potato plant diseases Early Blight and Late Blight.The dataset consists of 450 images, and the proposed method consists of three parts: image processing, image segmentation, feature extraction, training, and classification (Iqbal and Talukder, 2020).The image is first converted to RGB format, then to HSV color format.To get the GFD (global feature descriptor), Hu moments, Haralick texture, and color histogram were used.Random Forest, Support Vector Machine, Logistic Regression, Decision Trees, Naive Bayes, k-Nearest Neighbors, and Linear Discriminant Analysis were used for training.According to the results obtained, the random forest model gained the highest accuracy, which was 97% over all the tested data.
Pushpa and others developed a system to identify the tomato plants that were affected by the Fusarium oxysporum disease (Pushpa et al., 2021).The proposed technique consists of the Naive Bayes machine learning technique, and in order to achieve greater accuracy, classification and detection are performed twice.The database consists of 87,000 images.With a large amount of data, the hybrid algorithm identifies the disease with an accuracy of 96%.The model used in the system will initially preprocess the captured image for noise removal, and then registration and classification of the image will take place, where an alarm will be turned on if the disease is present on the leaf.The next phase is the VTT phase.This phase has been introduced as a solution to the problem of erroneous results after the classification.In this phase, training, testing, and validation take place in order to achieve higher accuracy.
Aparajita and others developed a system to identify potato late blight disease.Adaptive thresholding was used for segmentation of the disease-affected areas of the leaf (Aparajita et al., 2017).The proposed method has achieved an accuracy of 96%.Initially, the background and the shadow of the image were removed before the segmentation process.In background removal, the RGB image was transformed into the HSI color model, which includes components such as intensity, hue, and saturation.The proposed method achieved an accuracy of 100% on sensitivity and around 92% on specificity.
Sholihati and others developed a deep learning-based system for classifying four types of diseases in potato leaves An accurate classification model was created using the VGG-19 and VGG-16 convolutional neural network models (Sholihati et al., 2020).This system has achieved an average accuracy of 91%, indicating that the deep neural network approach is feasible.The methodology is carried out in three sections.They are data acquisition, which used a dataset of approximately 5100 pictures of various sizes and resolutions, and data pre-processing, which minimizes noise from the image by removing the portion of the image that doesn't belong to the area of interest.It won't be used if the image contains excessive noise.The next step is data augmentation, where data augmentation is a method of manipulation of data that occurs without losing its essence.Because 5100 datasets are insufficient for maximum effectiveness, data augmentation has been used in this study.The experimental results presented in the paper demonstrate the effectiveness of the deep learning approach for potato leaf disease classification.The CNN model achieves high accuracy in distinguishing between healthy leaves and different disease types, providing a valuable tool for farmers to detect and manage diseases in their potato crops.The augmentation standards used in this study are automatically generated using basic geometric transition processes such as rotations, changes in intensity, translations, shearing, and both vertical and horizontal twists.Finally, image classification is performed, in which the images are classified.The system was very successful due to its high accuracy and the huge datasets used to train the system.leaf shape and patterns were identified.This approach gained a full-color model accuracy of 99.84% and a gray-scale accuracy of 95.54% (Table 1).

DISCUSSION
According to the machine learning techniques and image processing methods used in the reviewed systems, it is observable that development initially began with only image processing methods, but as technology advanced, both image processing and machine learning techniques were used to develop systems.Among the various image processing techniques that have been used, the most common (majority) methods are preprocessing, segmentation, feature extraction, and classification.In preprocessing, the R, G, and B color components are separated and the removal of leaf noise takes place; and in segmentation, the RGB image is converted into an HSV image (Gadade and Kirange, 2020).Some systems used acquisition, ROI adjustment, and feature extraction.Adaptive thresholding (Aparajita et al., 2017) was also used in some systems.And other systems used restoration, segmentation, and translation (Ashqar and Abu-Naser, 2018) .Among the various machine learning techniques that have been used are the 1D-CNN algorithm, deep learning, the SVM algorithm, Naive Bayes, CNN, Random Forest, Logistic Regression, Decision Trees, Naive Bayes, Artificial Neural Network, k-Nearest Neighbors, and Linear Discriminant Analysis.The best algorithm can be chosen by analyzing several important factors such as accuracy, sensitivity, and efficiency.

CONCLUSION
The destruction of vegetable crops due to the infection of various diseases is one of the major issues faced by Sri Lankan vegetable cultivators.Furthermore, the type of disease infecting the vegetable crops must be identified in order to apply a suitable solution to protect the vegetable yards from being destroyed.As a result, it is critical to provide farmers with a comprehensive system for identifying the type of disease infected on vegetables and the appropriate solutions.By evaluating several existing systems through research papers, the different machine learning algorithms and image processing methods used in each system were analyzed.The most popular, accurate, and efficient image processing methods used by many of the evaluated systems were preprocessing, segmentation, feature extraction, and classification.These methods provide an accuracy of 96% along with 100% sensitivity and 92% specificity (Aparajita et al., 2017) .According to the evaluated research papers, the CNN machine learning-based algorithm is the most popular algorithm which achieves higher accuracy, efficiency, and sensitivity.The CNN algorithm is more important since it detects major characteristics automatically and accurately with no human involvement.Among all the reviewed CNN architectures, VGG-19 and VGG-16 based CNN architectures are the most suitable for the identification of plant diseases.The model achieved full color accuracy of 99.84% and gray-scale accuracy of 95.54% (Ashqar and Abu-Naser, 2018) .VGG19 and VGG16 are among the most sophisticated CNN models, which have previously trained layers and a strong grasp of what characterizes a picture in terms of color, shape, and structure.VGG19 and VGG16 are two of the most sophisticated neural networks that have been trained on thousands of distinct images using complicated classification techniques.