Skip to contents

Preamble

One often gets unclean messy spatial lines or polygon data, even though simple features follow a standard, this sometimes resulting in problems with further pocessing downstream. This even in cases where the source is LMÍ.

Here we show how one could possibly fix such cases. The libraries we use are:

Strandlínur

Lets get the strandlínur from source:

iceland <- 
  gl_lmi_strandlina(make_valid = FALSE) |> 
  # we do not need most of the supplied variables
  select(gml_id, eyjarsker)

We can obtain a plot of e.g. Breiðafjörður by:

iceland |> 
  filter(eyjarsker == 1) |> 
  ggplot() +
  theme_bw() +
  geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
  coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
  scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
  scale_y_continuous(breaks = seq(63, 66, by = 0.5))

Now we may want to exclude scerries and the smaller islands from this plot. One could filter out skerries because that variable is in the dataframe:

iceland |> 
  filter(eyjarsker == 1) |> 
  ggplot() +
  theme_bw() +
  geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
  coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
  scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
  scale_y_continuous(breaks = seq(63, 66, by = 0.5))

But what if we want to exclude certain islands below a certain area Because that is not a variable in the dataframe from LMÍ we could compute them:

iceland |> 
  mutate(area = st_area(geom))
#> Error in `stopifnot()`:
#>  In argument: `area = st_area(geom)`.
#> Caused by error in `wk_handle.wk_wkb()`:
#> ! Loop 0 is not valid: Edge 0 is degenerate (duplicate vertex)

We get an error because some of the polygons are not strictly valid. Lets check how many of the objects are not valid:

iceland |> 
  st_is_valid() |> 
  table()
#> 
#> FALSE  TRUE 
#>  1416  4828

Lets get the reason for them not being valid:

# little helper functions, just get the "class" of the reason, not the details
lh_reasons <- function(x) {
  case_when(x == "Valid Geometry" ~ "valid",
            str_detect(x, "degenerate") ~ "degenerate",
            str_detect(x, "crosses edge") ~ "crosses edge",
            str_detect(x, "duplicate vertex") ~ "duplicate vertex",
            TRUE ~ NA_character_)
}

iceland |> 
  mutate(reason = st_is_valid(geom , reason = TRUE),
         reason = lh_reasons(reason)) |> 
  st_drop_geometry() |> 
  count(reason)
#> # A tibble: 4 × 2
#>   reason               n
#>   <chr>            <int>
#> 1 crosses edge        23
#> 2 degenerate        1387
#> 3 duplicate vertex     6
#> 4 valid             4828

If you want to understand what “degenerate” means, check this out.

Fixing invalid simple features

Here we just want to try to fix things so that one can calculate the area. We could try to fix this by:

iceland |> 
  st_make_valid() |> 
  mutate(reason = st_is_valid(geom, reason = TRUE),
         reason = lh_reasons(reason)) |> 
  st_drop_geometry() |> 
  count(reason)
#> # A tibble: 2 × 2
#>   reason           n
#>   <chr>        <int>
#> 1 crosses edge     5
#> 2 valid         6239

OK, so there as still a few invalids. Lets give it a one more go by trying the lwgeom_make_valid-function :

iceland |> 
  st_make_valid() |> 
  mutate(geom = lwgeom_make_valid(geom)) |> 
  mutate(reason = st_is_valid(geom, reason = TRUE),
         reason = lh_reasons(reason)) |> 
  st_drop_geometry() |> 
  count(reason)
#> # A tibble: 1 × 2
#>   reason     n
#>   <chr>  <int>
#> 1 valid   6244

Ok, so by this process all simple features are valid. Lets retain this object:

iceland <- 
  iceland |>
  st_make_valid() |> 
  mutate(geom = lwgeom_make_valid(geom))

Now we can calculate the area:

iceland <- 
  iceland |> 
  mutate(area = st_area(geom)) |> 
  select(gml_id, eyjarsker, area)

Let’s just retain the 50 largest islands and then plot Breiðafjörður:

iceland |> 
  mutate(area = area / 1e6) |> 
  arrange(-area) |> 
  slice(1:50) |> 
  ggplot() +
  theme_bw() +
  geom_sf(fill = "white", alpha = 0.0, lwd = 0.1) +
  coord_sf(xlim = c(-24.4, -21.6), ylim = c(64.9, 65.6)) +
  scale_x_continuous(breaks = seq(-26, -20, by = 1)) +
  scale_y_continuous(breaks = seq(63, 66, by = 0.5))