Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number

Mao, K; Niu, J; Liu, X; Yu, S; Zhao, L

Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number

Mao, K Niu, J Liu, X Yu, S

Zhao, L

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings, 2019, 00, pp. 1-6
Issue Date:: 2019-12-01

Closed Access

	Filename	Description	Size
	Word2Cluster.pdf	Published version	403.18 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Mao, K
dc.contributor.author	Niu, J
dc.contributor.author	Liu, X
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.contributor.author	Zhao, L
dc.date	2019-12-09
dc.date.accessioned	2021-04-28T23:59:04Z
dc.date.available	2021-04-28T23:59:04Z
dc.date.issued	2019-12-01
dc.identifier.citation	2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings, 2019, 00, pp. 1-6
dc.identifier.isbn	9781728109626
dc.identifier.uri	http://hdl.handle.net/10453/148516
dc.description.abstract	Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering number, which is difficult to obtain. Moreover, the mutli-label clustering is useful in multiple clustering tasks in many applications, but related works are rarely available. Although several studies have attempted to solve above two problems, there is a need for methods that can solve the two issues simultaneously. Therefore, we propose a new text clustering algorithm called Word2Cluster. Word2Cluster can automatically generate an adaptive number of clusters and support multi-label clustering. To test the performance of Wrod2Cluster, we build a Chinese text dataset, Hotline, according to real world applications. To evaluate the clustering results better, we propose an improved evaluation method based on basic accuracy, precision and recall for multi-label text clustering. Experimental results on a Chinese text dataset (Hotline) and a public English text dataset (Reuters) demonstrate that our algorithm can achieve better F1-measure and runs faster than the state-of- the-art baselines.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings
dc.relation.ispartof	GLOBECOM 2019 - 2019 IEEE Global Communications Conference
dc.relation.isbasedon	10.1109/GLOBECOM38437.2019.9013266
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number
dc.type	Conference Proceeding
utslib.citation.volume	00
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2021-04-28T23:59:03Z
pubs.finish-date	2019-12-13
pubs.publication-status	Published
pubs.start-date	2019-12-09
pubs.volume	00

Abstract:

Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering number, which is difficult to obtain. Moreover, the mutli-label clustering is useful in multiple clustering tasks in many applications, but related works are rarely available. Although several studies have attempted to solve above two problems, there is a need for methods that can solve the two issues simultaneously. Therefore, we propose a new text clustering algorithm called Word2Cluster. Word2Cluster can automatically generate an adaptive number of clusters and support multi-label clustering. To test the performance of Wrod2Cluster, we build a Chinese text dataset, Hotline, according to real world applications. To evaluate the clustering results better, we propose an improved evaluation method based on basic accuracy, precision and recall for multi-label text clustering. Experimental results on a Chinese text dataset (Hotline) and a public English text dataset (Reuters) demonstrate that our algorithm can achieve better F1-measure and runs faster than the state-of- the-art baselines.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148516