How does disagreement help generalization against label corruption?

Yu, X; Han, B; Yao, J; Niu, G; Tsang, IW; Sugiyama, M

How does disagreement help generalization against label corruption?

Yu, X

Han, B Yao, J Niu, G Tsang, IW

Sugiyama, M

Permalink

Publication Type:: Conference Proceeding
Citation:: 36th International Conference on Machine Learning, ICML 2019, 2019, 2019-June pp. 12407 - 12417
Issue Date:: 2019-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (999.29 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yu, X https://orcid.org/0000-0002-8941-2698	en_US
dc.contributor.author	Han, B	en_US
dc.contributor.author	Yao, J	en_US
dc.contributor.author	Niu, G	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.contributor.author	Sugiyama, M	en_US
dc.date.available	2019-04-22	en_US
dc.date.issued	2019-01-01	en_US
dc.identifier.citation	36th International Conference on Machine Learning, ICML 2019, 2019, 2019-June pp. 12407 - 12417	en_US
dc.identifier.isbn	9781510886988	en_US
dc.identifier.uri	http://hdl.handle.net/10453/133611
dc.description.abstract	Copyright © 2019 ASME Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Cotcaching+ is much superior to many statc-of-thcart methods in the robustness of trained models.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130100746
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/DP180100106
dc.relation.ispartof	36th International Conference on Machine Learning, ICML 2019	en_US
dc.title	How does disagreement help generalization against label corruption?	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2019-June	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2019-June	en_US

Abstract:

Copyright © 2019 ASME Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Cotcaching+ is much superior to many statc-of-thcart methods in the robustness of trained models.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133611