Dealing with invalid geoms
invalids.Rmd
Preamble
One often gets unclean messy spatial lines or polygon data, even though simple features follow a standard, this sometimes resulting in problems with further pocessing downstream. This even in cases where the source is LMÍ.
Here we show how one could possibly fix such cases. The libraries we use are:
Strandlínur
Lets get the strandlínur from source:
iceland <-
gl_lmi_strandlina(make_valid = FALSE) |>
# we do not need most of the supplied variables
select(gml_id, eyjarsker)
We can obtain a plot of e.g. Breiðafjörður by:
iceland |>
filter(eyjarsker == 1) |>
ggplot() +
theme_bw() +
geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
scale_y_continuous(breaks = seq(63, 66, by = 0.5))
Now we may want to exclude scerries and the smaller islands from this plot. One could filter out skerries because that variable is in the dataframe:
iceland |>
filter(eyjarsker == 1) |>
ggplot() +
theme_bw() +
geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
scale_y_continuous(breaks = seq(63, 66, by = 0.5))
But what if we want to exclude certain islands below a certain area Because that is not a variable in the dataframe from LMÍ we could compute them:
iceland |>
mutate(area = st_area(geom))
#> Error in `stopifnot()`:
#> ℹ In argument: `area = st_area(geom)`.
#> Caused by error in `wk_handle.wk_wkb()`:
#> ! Loop 0 is not valid: Edge 0 is degenerate (duplicate vertex)
We get an error because some of the polygons are not strictly valid. Lets check how many of the objects are not valid:
iceland |>
st_is_valid() |>
table()
#>
#> FALSE TRUE
#> 1416 4828
Lets get the reason for them not being valid:
# little helper functions, just get the "class" of the reason, not the details
lh_reasons <- function(x) {
case_when(x == "Valid Geometry" ~ "valid",
str_detect(x, "degenerate") ~ "degenerate",
str_detect(x, "crosses edge") ~ "crosses edge",
str_detect(x, "duplicate vertex") ~ "duplicate vertex",
TRUE ~ NA_character_)
}
iceland |>
mutate(reason = st_is_valid(geom , reason = TRUE),
reason = lh_reasons(reason)) |>
st_drop_geometry() |>
count(reason)
#> # A tibble: 4 × 2
#> reason n
#> <chr> <int>
#> 1 crosses edge 23
#> 2 degenerate 1387
#> 3 duplicate vertex 6
#> 4 valid 4828
If you want to understand what “degenerate” means, check this out.
Fixing invalid simple features
Here we just want to try to fix things so that one can calculate the area. We could try to fix this by:
iceland |>
st_make_valid() |>
mutate(reason = st_is_valid(geom, reason = TRUE),
reason = lh_reasons(reason)) |>
st_drop_geometry() |>
count(reason)
#> # A tibble: 2 × 2
#> reason n
#> <chr> <int>
#> 1 crosses edge 5
#> 2 valid 6239
OK, so there as still a few invalids. Lets give it a one more go by trying the lwgeom_make_valid-function :
iceland |>
st_make_valid() |>
mutate(geom = lwgeom_make_valid(geom)) |>
mutate(reason = st_is_valid(geom, reason = TRUE),
reason = lh_reasons(reason)) |>
st_drop_geometry() |>
count(reason)
#> # A tibble: 1 × 2
#> reason n
#> <chr> <int>
#> 1 valid 6244
Ok, so by this process all simple features are valid. Lets retain this object:
iceland <-
iceland |>
st_make_valid() |>
mutate(geom = lwgeom_make_valid(geom))
Now we can calculate the area:
Let’s just retain the 50 largest islands and then plot Breiðafjörður:
iceland |>
mutate(area = area / 1e6) |>
arrange(-area) |>
slice(1:50) |>
ggplot() +
theme_bw() +
geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
scale_y_continuous(breaks = seq(63, 66, by = 0.5))