Efficiently clean a string with .NET

Efficiently clean a string with .NET

Strings are one of the most commonly used types in .NET applications - and very often the source of inefficient code. For example, cleaning up a string - such as removing invalid or non-visible characters - is one of the most common use cases for user input. Unfortunately, the most convenient, but not the most efficient, implementation imaginable is used in this case: Linq.

See full sample here: Sustainable Code by BEN ABT on GitHub

String Manipulation

The most inefficient option, as already mentioned, is direct string manipulation with Linq. Unfortunately, I see this variant by far the most.

string.Concat(source.Where(c => char.IsControl(c) is false))
| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| Linq                   | .NET 8.0 | 1,646.5 ns |  3.47 |    0.03 | 0.0782 |      - |    1.3 KB |        1.07 |

String Builder

The StringBuilder is significantly faster, as it works directly on the memory area of the string and is therefore very efficient.

StringBuilder sb = new();

foreach (char c in source)
{
    if (char.IsControl(c) is false)
    {
        sb.Append(c);
    }
}

return sb.ToString();
| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder_Instance | .NET 8.0 |   860.0 ns |  1.81 |    0.02 | 0.2270 | 0.0019 |   3.71 KB |        3.04 |

The disadvantage at this point is that the StringBuilder must be initialized in addition to the actual operation, which is why we also have an additional Gen1 allocation. However, this disadvantage can be optimized by pooling, so that this takes about 15% less time than the instance variant and about 50% less than the Linq variant.

| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder_Pool     | .NET 8.0 |   706.2 ns |  1.49 |    0.02 | 0.0744 |      - |   1.22 KB |        1.00 |

Span

The span implementation has been available in .NET for almost seven years, but it is still often not used. Admittedly, the barrier to entry of understanding Span and the somewhat “low level” approach doesn't make it easy, but the results speak for themselves.

The four different span implementations are all faster and more efficient by a large margin; led by an unsafe implementation, which however cannot be used in all scenarios; for example when unsafe code blocks are not possible or allowed.

| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| Span                   | .NET 8.0 |   394.4 ns |  0.83 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span1                  | .NET 8.0 |   474.7 ns |  1.00 |    0.00 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2                  | .NET 8.0 |   411.8 ns |  0.87 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2Unsafe            | .NET 8.0 |   375.1 ns |  0.79 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
public static string UsingSpan(string source)
{
    int length = source.Length;
    char[]? rentedFromPool = null;

    // allocate
    Span<char> buffer = length > 512 ?
        (rentedFromPool = ArrayPool<char>.Shared.Rent(length)) : (stackalloc char[512]);

    // filter
    int index = 0;
    foreach (char c in source)
    {
        if (char.IsControl(c) is false)
        {
            buffer[index] = c;
            index++;
        }
    }

    // only return the data that was written
    string data = buffer.Slice(0, index).ToString();

    // cleanup
    if (rentedFromPool is not null)
    {
        ArrayPool<char>.Shared.Return(rentedFromPool, clearArray: true);
    }

    return data;
}

Sustainable Code

Sustainable code is becoming increasingly important - this should not be neglected, especially in migration projects. This is the best opportunity to implement sustainable code from the ground up.

You can find the complete example of how to clean up strings efficiently here, and more examples of sustainable implementation of everyday code with C# and .NET under https://github.com/BenjaminAbt/SustainableCode.

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4412/22H2/2022Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]   : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2


| Method                 | Runtime  | Mean       | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
|----------------------- |--------- |-----------:|------:|--------:|-------:|-------:|----------:|------------:|
| StringBuilder_Pool     | .NET 8.0 |   706.2 ns |  1.49 |    0.02 | 0.0744 |      - |   1.22 KB |        1.00 |
| StringBuilder_Instance | .NET 8.0 |   860.0 ns |  1.81 |    0.02 | 0.2270 | 0.0019 |   3.71 KB |        3.04 |
| Linq                   | .NET 8.0 | 1,646.5 ns |  3.47 |    0.03 | 0.0782 |      - |    1.3 KB |        1.07 |
| Span                   | .NET 8.0 |   394.4 ns |  0.83 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span1                  | .NET 8.0 |   474.7 ns |  1.00 |    0.00 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2                  | .NET 8.0 |   411.8 ns |  0.87 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2Unsafe            | .NET 8.0 |   375.1 ns |  0.79 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
|                        |          |            |       |         |        |        |           |             |
| StringBuilder_Pool     | .NET 9.0 |   616.3 ns |  1.19 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| StringBuilder_Instance | .NET 9.0 |   715.9 ns |  1.39 |    0.01 | 0.2270 | 0.0019 |   3.71 KB |        3.04 |
| Linq                   | .NET 9.0 | 1,663.5 ns |  3.22 |    0.02 | 0.0782 |      - |    1.3 KB |        1.07 |
| Span                   | .NET 9.0 |   404.9 ns |  0.78 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span1                  | .NET 9.0 |   516.1 ns |  1.00 |    0.00 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2                  | .NET 9.0 |   398.9 ns |  0.77 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |
| Span2Unsafe            | .NET 9.0 |   389.4 ns |  0.75 |    0.01 | 0.0744 |      - |   1.22 KB |        1.00 |