Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms

Abraham, MT; Satyam, N; Jain, P; Pradhan, B; Alamri, A

Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms

Abraham, MT Satyam, N Jain, P Pradhan, B

Alamri, A

Permalink

Publisher:: Taylor & Francis Open Access
Publication Type:: Journal Article
Citation:: Geomatics, Natural Hazards and Risk, 2021, 12, (1), pp. 3381-3408
Issue Date:: 2021-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (4.97 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Abraham, MT
dc.contributor.author	Satyam, N
dc.contributor.author	Jain, P
dc.contributor.author	Pradhan, B https://orcid.org/0000-0001-9863-2054
dc.contributor.author	Alamri, A
dc.date.accessioned	2022-02-18T02:57:12Z
dc.date.available	2022-02-18T02:57:12Z
dc.date.issued	2021-01-01
dc.identifier.citation	Geomatics, Natural Hazards and Risk, 2021, 12, (1), pp. 3381-3408
dc.identifier.issn	1947-5705
dc.identifier.issn	1947-5713
dc.identifier.uri	http://hdl.handle.net/10453/154680
dc.description.abstract	With the increasing computational facilities and data availability, machine learning (ML) models are gaining wide attention in landslide modeling. This study evaluates the effect of spatial resolution and data splitting, using five different ML algorithms (naïve bayes (NB), K nearest neighbors (KNN), logistic regression (LR), random forest (RF) and support vector machines (SVM)). The maps were developed using twelve landslide conditioning factors at two different resolutions, 12.5 m and 30 m. To identify the effect of data splitting on model performance, 2162 landslide points and an equal number of non-landslide points were used for training and testing the models using k-fold cross-validation, by varying the number of folds from two to ten. Results indicated that the spatial resolution of the dataset affects the performance of all the algorithms considered, while the effect of data splitting is significant in KNN and RF algorithms. All the algorithms yielded better performance while using the dataset with 12.5 m resolution for the same number of folds. It was also observed that the accuracy and area-under-the-curve values of 7, 8, 9, and 10-fold cross-validations with 30 m resolution was better than 2 and 3-fold cross-validations using 12.5 m resolution, in the case of RF algorithm.
dc.language	English
dc.publisher	Taylor & Francis Open Access
dc.relation.ispartof	Geomatics, Natural Hazards and Risk
dc.relation.isbasedon	10.1080/19475705.2021.2011791
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0406 Physical Geography and Environmental Geoscience
dc.title	Effect of spatial resolution and data splitting on landslide susceptibility mapping using different machine learning algorithms
dc.type	Journal Article
utslib.citation.volume	12
utslib.for	0406 Physical Geography and Environmental Geoscience
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Civil and Environmental Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CAMGIS - Centre for Advanced Modelling and Geospatial lnformation Systems
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2022-02-18T02:57:10Z
pubs.issue	1
pubs.publication-status	Published
pubs.volume	12
utslib.citation.issue	1

Abstract:

With the increasing computational facilities and data availability, machine learning (ML) models are gaining wide attention in landslide modeling. This study evaluates the effect of spatial resolution and data splitting, using five different ML algorithms (naïve bayes (NB), K nearest neighbors (KNN), logistic regression (LR), random forest (RF) and support vector machines (SVM)). The maps were developed using twelve landslide conditioning factors at two different resolutions, 12.5 m and 30 m. To identify the effect of data splitting on model performance, 2162 landslide points and an equal number of non-landslide points were used for training and testing the models using k-fold cross-validation, by varying the number of folds from two to ten. Results indicated that the spatial resolution of the dataset affects the performance of all the algorithms considered, while the effect of data splitting is significant in KNN and RF algorithms. All the algorithms yielded better performance while using the dataset with 12.5 m resolution for the same number of folds. It was also observed that the accuracy and area-under-the-curve values of 7, 8, 9, and 10-fold cross-validations with 30 m resolution was better than 2 and 3-fold cross-validations using 12.5 m resolution, in the case of RF algorithm.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154680