An algorithm for creating semi-synthetic datasets for regional classification of cotton varieties

A K Nishanov; O K Akhmedov; S U Aktamov

doi:10.14719/pst.12737

Research Articles

Early Access

An algorithm for creating semi-synthetic datasets for regional classification of cotton varieties

Nishanov Akhram Khasanovich^▸^▾
Akhmedov Oybek Kamarbekovich^▸^▾
Aktamov Shokhrukhbek Ulug’bek o’g’li^▸^▾

DOI: https://doi.org/10.14719/pst.12737
Submitted: 14 November 2025
Published: 27-04-2026

Abstract

In this study, an algorithm for synthetic expansion of training datasets was developed and applied to the regional classification of cotton varieties recommended for cultivation in the Republic of Uzbekistan. The algorithm is based on heuristic logic, and the class objects were restructured based on similarity criteria. Initially, a space of textual, nominal, and quantitative features was formed using real data from the state register. Subsequently, the features were fully converted to nominal form, and the degrees of similarity between objects were determined through scaling. A proximity function and decision rules were developed, and the contribution of class objects to their respective classes was evaluated. Artificial objects were generated based on heuristic criteria, increasing the number of classes and their elements. This approach significantly improved the stability and accuracy of the classification model. Experimental results showed that the proposed algorithm achieved a precision of 95.8 %, which is substantially higher than that of the decision tree (87 %) and KNN (84.4 %) algorithms, demonstrating the effectiveness of the proposed method. The research results have created the opportunity to accurately classify cotton varieties by region.

References

1. Fränti P, Sieranoja S. K-means properties on six clustering benchmark datasets. Appl Intel. 2018;48(12):4743–759. https://doi.org/10.1007/s10489-018-1238-7
2. Qi J, Yu Y, Wang L, Liu J. K*-Means: An effective and efficient K-means clustering algorithm. 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom). 2016. p. 242–49. https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.47
3. Geeks for Geeks. DBSCAN clustering in ML: Density-based clustering. 2025. https://www.geeksforgeeks.org/dbscan–clustering–in–ml–density–based-clustering/
4. Zhang R, Peng H, Dou Y, Wu J, Sun Q, Zhang J, et al. Automating DBSCAN via deep reinforcement learning. Proceedings of the 31st ACM international conference on information and knowledge management (CIKM 2022). Association for computing machinery (ACM). 2022. https://doi.org/10.1145/3511808.3557220
5. Ala’raj M, Majdalawieh M, Abbod MF. Improving binary classification using filtering based on k-NN proximity graphs. J Big Data. 2020;7(1):1–18. https://doi.org/10.1186/s40537-020-00332-9
6. Ignatyev NA. Structure choice for relations between objects in metric classification algorithms. Pattern recognition and image analysis. Adv Math Theory Appl. 2018;28(4):695–702. https://doi.org/10.1134/S1054661818040097
7. Nishanov A, Ruzibaev O, Tran N. Modification of decision rules “Ball Apolonia” for the problem of classification. 2016 international conference on information science and communications technologies (ICISCT 2016). 2016. https://doi.org/10.1109/ICISCT.2016.7777382
8. Nishanov A, Akbarova M, Tursunov A, Ollamberganov F, Rashidova D. Clustering algorithm based on object similarity. J Math Mech Comput Sci. 2024;123(3):108–20. https://doi.org/10.26577/JMMCS2024-v123-i3-4
9. Nishanov A, Tursunov A, Ollamberganov F, Rashidova D. Algorithm for clustering different types of drugs affecting blood pressure. J Math Mech Comput Sci. 2025;125(1).
10. Abdukarimov DT, Lukov MQ. Cotton breeding and seed production. Textbook. Tashkent. 2015. p. 331.
11. Avliyoqulov AE, Akhmedov J, Nuriddinov A. Agrotechnical measures for cultivating cotton varieties. Tashkent. 2016. p. 4–56.
12. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2007–2017.
13. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2008–2018.
14. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2022.

Downloads

Download data is not yet available.

Keywords

automatic classification
classification
cotton varieties
degree of similarity
nominal and quantitative
objects
regionalisation
synthetic training samples
text

How to Cite

Nishanov AK, Akhmedov OK, Aktamov SU. An algorithm for creating semi-synthetic datasets for regional classification of cotton varieties. Plant Sci. Today [Internet]. 2026 Apr. 27 [cited 2026 Apr. 27];. Available from: https://horizonepublishing.com/journals/index.php/PST/article/view/12737

This work is licensed under a Creative Commons Attribution 4.0 International License.

[1] 1. Fränti P, Sieranoja S. K-means properties on six clustering benchmark datasets. Appl Intel. 2018;48(12):4743–759. https://doi.org/10.1007/s10489-018-1238-7

[2] 2. Qi J, Yu Y, Wang L, Liu J. K*-Means: An effective and efficient K-means clustering algorithm. 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom). 2016. p. 242–49. https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.47

[3] 3. Geeks for Geeks. DBSCAN clustering in ML: Density-based clustering. 2025. https://www.geeksforgeeks.org/dbscan–clustering–in–ml–density–based-clustering/

[4] 4. Zhang R, Peng H, Dou Y, Wu J, Sun Q, Zhang J, et al. Automating DBSCAN via deep reinforcement learning. Proceedings of the 31st ACM international conference on information and knowledge management (CIKM 2022). Association for computing machinery (ACM). 2022. https://doi.org/10.1145/3511808.3557220

[5] 5. Ala’raj M, Majdalawieh M, Abbod MF. Improving binary classification using filtering based on k-NN proximity graphs. J Big Data. 2020;7(1):1–18. https://doi.org/10.1186/s40537-020-00332-9

[6] 6. Ignatyev NA. Structure choice for relations between objects in metric classification algorithms. Pattern recognition and image analysis. Adv Math Theory Appl. 2018;28(4):695–702. https://doi.org/10.1134/S1054661818040097

[7] 7. Nishanov A, Ruzibaev O, Tran N. Modification of decision rules “Ball Apolonia” for the problem of classification. 2016 international conference on information science and communications technologies (ICISCT 2016). 2016. https://doi.org/10.1109/ICISCT.2016.7777382

[8] 8. Nishanov A, Akbarova M, Tursunov A, Ollamberganov F, Rashidova D. Clustering algorithm based on object similarity. J Math Mech Comput Sci. 2024;123(3):108–20. https://doi.org/10.26577/JMMCS2024-v123-i3-4

[9] 9. Nishanov A, Tursunov A, Ollamberganov F, Rashidova D. Algorithm for clustering different types of drugs affecting blood pressure. J Math Mech Comput Sci. 2025;125(1).

[10] 10. Abdukarimov DT, Lukov MQ. Cotton breeding and seed production. Textbook. Tashkent. 2015. p. 331.

[11] 11. Avliyoqulov AE, Akhmedov J, Nuriddinov A. Agrotechnical measures for cultivating cotton varieties. Tashkent. 2016. p. 4–56.

[12] 12. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2007–2017.

[13] 13. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2008–2018.

[14] 14. State register of agricultural crops of the Republic of Uzbekistan. Varieties and hybrids recommended for cultivation. 2022.