A handwritten chinese text recognizer applying multi-level multimodal fusion network
- Publisher:
- IEEE
- Publication Type:
- Conference Proceeding
- Citation:
- Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 1464-1469
- Issue Date:
- 2020
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
08978158.pdf | Published Version | 1.37 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© 2019 IEEE. Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
Please use this identifier to cite or link to this item: