Resumen
The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%" role="presentation">(1%, 5%(1%, 5%
(
1
%
,
5
%
, and 10%)" role="presentation">10%)10%)
10
%
)
. Furthermore, the 10%" role="presentation">10%10%
10
%
PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2" role="presentation">+5.4,+1.2+5.4,+1.2
+
5.4
,
+
1.2
, and +3.2" role="presentation">+3.2+3.2
+
3.2
points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10%" role="presentation">10%10%
10
%
of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5" role="presentation">+4.5+4.5
+
4.5
points.