Text Line Segmentation on Myanmar Handwritten Document using Average Linkage Clustering Algorithm

Volume 10, Issue 1, Page No 48-59, 2025

Author’s Name: Nilar Phyo Wai ^1,a), Nu War ²

View Affiliations

¹ Image and Signal Processing Lab, University of Computer Studies, Mandalay, Mandalay, 05071, Myanmar
² Faculty of Computer Systems and Technologies, Myanmar Institute of Information Technology, Mandalay, 05071, Myanmar

^a)whom correspondence should be addressed. E-mail: nilarphyowai@ucsm.edu.mm

Adv. Sci. Technol. Eng. Syst. J. 10(1), 48-58 (2025); DOI: 10.25046/aj100106

Keywords: Myanmar Handwritten Document, Text Line Extraction, Text Line Segmentation, Connected Component Analysis, Average Linkage Clustering

Download Now!

38 Downloads

Export Citations

Abstract

Text line segmentation from document images is a significant challenge in the field of document image analysis. It involves extracting individual text lines from Myanmar handwritten document images to enable text recognition. This task becomes particularly challenging in Myanmar handwritten documents, especially those with irregular or cursive writing styles, due to variations in line spacing, and touching and overlapping characters in Myanmar handwritten documents. This paper proposes a text line extraction method based on an average linkage clustering algorithm for handwritten document images to address segmentation errors caused by characters with inconsistent spacing, different writing styles, and line overlaps due to ascenders and descenders. In this paper, Connected Components (CCs) are extracted by using Connected Component Analysis (CCA) and Anisotropic Gaussian multiscale technique. And then convex-hull computation based on the divide and conquer method is used to re-segment the irregular touching components. Then the text lines are extracted by the proposed system based on an average linkage clustering algorithm that consider both the smaller and larger within-cluster variance. The performance of the proposed method is evaluated using the Pixel and Line Intersection over Union (IU) values, which are found to be 93.27% of Pixel IU and 95.09% of Line IU on dataset 1 and 92.61% of Pixel IU and 89.90% of Line IU on dataset II, respectively. According to the experimental results based on the existing dataset and their own data set, the proposed system can give a better result than the Density-Based Spatial Clustering and Application with Noise (DBSCAN) clustering algorithm.

Received: 06 January 2025 Revised: 22 January 2025 Accepted: 23 January 2025 Online: 09 February 2025

Full Text

References (22)

I. Sanasam, P. Choudhary, K.M. Singh, “Line and word segmentation of handwritten text document by mid-point detection and gap trailing,” Multimedia Tools and Applications, 79(41–42), 30135–30150, 2020, doi:10.1007/s11042-020-09416-1.
X. Zhang, L. Duan, L. Ma, J. Wu, “Text extraction for historical tibetan document images based on connected component analysis and corner point detection,” Communications in Computer and Information Science, 772, 545–555, 2017, doi:10.1007/978-981-10-7302-1_45.
X. Han, H. Yao, G. Zhong, “Handwritten text line segmentation by spectral clustering,” Eighth International Conference on Graphic and Image Processing (ICGIP 2016), 10225(Icgip 2016), 102251A, 2017, doi:10.1117/12.2266982.
A. Fawzi, M. Pastor, C.D. Martínez-Hinarejos, “Baseline detection on Arabic handwritten documents,” DocEng 2017 – Proceedings of the 2017 ACM Symposium on Document Engineering, 193–196, 2017, doi:10.1145/3103010.3121037.
B.K. Barakat, R. Cohen, A. Droby, I. Rabaev, J. El-Sana, “Learning-free text line segmentation for historical handwritten documents,” Applied Sciences (Switzerland), 10(22), 1–19, 2020, doi:10.3390/app10228276.
K. Sun, T. Liu, L. Zhang, M. Hao, “Handwritten Manchu Historical Document Segmentation with Anisotropic Gaussian Kernel,” Proceedings – 2022 Chinese Automation Congress, CAC 2022, 2022-Janua, 727–731, 2022, doi:10.1109/CAC57257.2022.10055732.
P. P V, D. Sankar, “Handwriting-Based Text Line Segmentation from Malayalam Documents,” Applied Sciences (Switzerland), 13(17), 2023, doi:10.3390/app13179712.
Z. Li, W. Wang, Y. Chen, Y. Hao, “A novel method of text line segmentation for historical document image of the uchen Tibetan,” Journal of Visual Communication and Image Representation, 61, 23–32, 2019, doi:10.1016/j.jvcir.2019.01.021.
B.K. Barakat, A. Droby, R. Alaasam, B. Madi, I. Rabaev, R. Shammes, J. El-Sana, “Unsupervised deep learning for text line segmentation,” Proceedings – International Conference on Pattern Recognition, (d), 2304–2311, 2020, doi:10.1109/ICPR48806.2021.9413308.
Q.N. Vo, S.H. Kim, H.J. Yang, G.S. Lee, “Text line segmentation using a fully convolutional network in handwritten document images,” IET Image Processing, 12(3), 438–446, 2018, doi:10.1049/iet-ipr.2017.0083.
F.C. Fizaine, P. Bard, M. Paindavoine, C. Robin, E. Bouyé, R. Lefèvre, A. Vinter, “Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks,” Journal of Imaging, 10(3), 2024, doi:10.3390/jimaging10030065.
A. Nyein, H. Khaung Tin, “Handwritten Myanmar Character Recognition System using the Otsu’s Binarization Algorithm,” 2021, doi:10.4108/eai.27-2-2020.2303219.
B. Barakat, A. Droby, M. Kassis, J. El-Sana, “Text line segmentation for challenging handwritten document images using fully convolutional network,” Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2018-August, 374–379, 2018, doi:10.1109/ICFHR-2018.2018.00072.
P. Shivakumara, T. Jain, U. Pal, N. Surana, A. Antonacopoulos, T. Lu, “Text line segmentation from struck-out handwritten document images,” Expert Systems with Applications, 210(July 2021), 118266, 2022, doi:10.1016/j.eswa.2022.118266.
J. Sklansky, “Finding the convex hull of a simple polygon,” Pattern Recognition Letters, 1(2), 79–83, 1982, doi:10.1016/0167-8655(82)90016-2.
A. Benevento, F. Durante, “Correlation-based hierarchical clustering of time series with spatial constraints,” Spatial Statistics, 59(April 2023), 100797, 2024, doi:10.1016/j.spasta.2023.100797.
N. Xu, R.B. Finkelman, S. Dai, C. Xu, M. Peng, “Average Linkage Hierarchical Clustering Algorithm for Determining the Relationships between Elements in Coal,” ACS Omega, 6(9), 6206–6217, 2021, doi:10.1021/acsomega.0c05758.
C. Clausner, S. Pletschacher, A. Antonacopoulos, “Aletheia – An advanced document layout and text ground-truthing system for production environments,” Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 48–52, 2011, doi:10.1109/ICDAR.2011.19.
M. Daszykowski, B. Walczak, “2.26 – Density-Based Clustering Methods,” Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, Second Edition: Four Volume Set, 2, 565–580, 2020, doi:10.1016/B978-0-444-64165-6.03005-6.
F. Simistira, M. Bouillon, M. Seuret, M. Wursch, M. Alberti, R. Ingold, M. Liwicki, “ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts,” Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1, 1361–1370, 2017, doi:10.1109/ICDAR.2017.223.
O. Surinta, M. Holtkamp, F. Karabaa, J.P. Van Oosten, L. Schomaker, M. Wiering, “A Path Planning for Line Segmentation of Handwritten Documents,” Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2014-Decem, 175–180, 2014, doi:10.1109/ICFHR.2014.37.
A. Alaei, U. Pal, P. Nagabhushan, “A new scheme for unconstrained handwritten text-line segmentation,” Pattern Recognition, 44(4), 917–928, 2011, doi:10.1016/j.patcog.2010.10.014.

Text Line Segmentation on Myanmar Handwritten Document using Average Linkage Clustering Algorithm