Replace and remove whitespaces in strings - performant and sustainable
There are many ways to remove spaces or other characters in a string - there are just very big differences in terms of performance and efficiency.
Benchmark
As part of my Sustainable Code Repository, I have created various code snippets to simulate everyday coding situations. I used regex, string operations and the relatively new Span API. I deliberately left out vectorization, e.g. with the help of Vector128, which will be the most performant solution. Vector128 is highly optimized, but not an everyday solution.
Code
So I have created a code that uses an input to remove spaces in various ways.
// Made by Benjamin Abt - https://github.com/BenjaminAbt
using System;
using System.Buffers;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Benchmark>();
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net90, baseline: true)]
[HideColumns(Column.Job)]
public class Benchmark
{
public const string Input = @"""
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
""";
[Benchmark]
public string Regex()
{
return RegexSample.WhiteSpaceRegex().Replace(Input, "");
}
[Benchmark]
public string String()
{
string data = Input;
return data.Replace(" ", "");
}
[Benchmark]
public string Span()
{
ReadOnlySpan<char> inputSpan = Input.AsSpan();
Span<char> resultSpan = stackalloc char[Input.Length];
int resultIndex = 0;
foreach (char c in inputSpan)
{
if (c is not ' ')
{
resultSpan[resultIndex++] = c;
}
}
return new string(resultSpan.Slice(0, resultIndex));
}
[Benchmark]
public string StringBuilder()
{
StringBuilder stringBuilder = new(Input);
stringBuilder.Replace(" ", "");
return stringBuilder.ToString();
}
[Benchmark]
public string JoinSplit()
{
return string.Join("", Input.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}
[Benchmark]
public string ConcatSplit()
{
return string.Concat(Input.Split(null));
}
[Benchmark]
public string SpanArrayPool()
{
char[] pooledArray = ArrayPool<char>.Shared.Rent(Input.Length);
try
{
Span<char> destination = pooledArray.AsSpan(0, Input.Length);
int pos = 0;
foreach (char c in Input)
{
if (!char.IsWhiteSpace(c))
{
destination[pos++] = c;
}
}
return Input.Length == pos ? Input : new string(destination[..pos]);
}
finally
{
ArrayPool<char>.Shared.Return(pooledArray);
}
}
[Benchmark]
public string SpanStackPool()
{
// this only works when Input <256 to avoid heap allocation
Span<char> destination = stackalloc char[Input.Length];
int pos = 0;
foreach (char c in Input)
{
if (!char.IsWhiteSpace(c))
{
destination[pos++] = c;
}
}
return Input.Length == pos ? Input : new string(destination[..pos]);
}
}
public static partial class RegexSample
{
[GeneratedRegex(@"\s+")]
public static partial Regex WhiteSpaceRegex();
}
Benchmark
I then used benchmarking to measure this code.
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5131/22H2/2022Update)
AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100
[Host] : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
.NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
.NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
| Method | Runtime | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|-------------- |--------- |-----------:|---------:|---------:|------:|--------:|-------:|-------:|----------:|------------:|
| Regex | .NET 8.0 | 656.2 ns | 7.17 ns | 6.36 ns | 1.06 | 0.01 | 0.0629 | - | 1.03 KB | 1.00 |
| Regex | .NET 9.0 | 617.5 ns | 3.67 ns | 3.44 ns | 1.00 | 0.01 | 0.0629 | - | 1.03 KB | 1.00 |
| | | | | | | | | | | |
| String | .NET 8.0 | 1,074.5 ns | 9.94 ns | 9.29 ns | 1.21 | 0.01 | 0.0629 | - | 1.05 KB | 1.00 |
| String | .NET 9.0 | 887.9 ns | 2.59 ns | 2.29 ns | 1.00 | 0.00 | 0.0639 | - | 1.05 KB | 1.00 |
| | | | | | | | | | | |
| Span | .NET 8.0 | 288.8 ns | 5.67 ns | 6.31 ns | 1.53 | 0.04 | 0.0639 | - | 1.05 KB | 1.00 |
| Span | .NET 9.0 | 188.4 ns | 2.61 ns | 2.45 ns | 1.00 | 0.02 | 0.0639 | - | 1.05 KB | 1.00 |
| | | | | | | | | | | |
| StringBuilder | .NET 8.0 | 1,042.8 ns | 8.32 ns | 7.78 ns | 1.20 | 0.02 | 0.1411 | - | 2.33 KB | 1.00 |
| StringBuilder | .NET 9.0 | 871.6 ns | 13.50 ns | 12.63 ns | 1.00 | 0.02 | 0.1421 | - | 2.33 KB | 1.00 |
| | | | | | | | | | | |
| JoinSplit | .NET 8.0 | 688.6 ns | 4.47 ns | 4.18 ns | 1.06 | 0.01 | 0.2422 | 0.0010 | 3.97 KB | 1.00 |
| JoinSplit | .NET 9.0 | 650.9 ns | 7.26 ns | 6.79 ns | 1.00 | 0.01 | 0.2422 | 0.0010 | 3.97 KB | 1.00 |
| | | | | | | | | | | |
| ConcatSplit | .NET 8.0 | 680.3 ns | 10.91 ns | 10.20 ns | 1.07 | 0.02 | 0.2251 | 0.0010 | 3.68 KB | 1.00 |
| ConcatSplit | .NET 9.0 | 634.2 ns | 11.93 ns | 11.15 ns | 1.00 | 0.02 | 0.2251 | 0.0010 | 3.68 KB | 1.00 |
| | | | | | | | | | | |
| SpanArrayPool | .NET 8.0 | 312.7 ns | 3.35 ns | 3.13 ns | 0.98 | 0.01 | 0.0629 | - | 1.03 KB | 1.00 |
| SpanArrayPool | .NET 9.0 | 318.8 ns | 3.48 ns | 3.26 ns | 1.00 | 0.01 | 0.0629 | - | 1.03 KB | 1.00 |
| | | | | | | | | | | |
| SpanStackPool | .NET 8.0 | 370.0 ns | 2.87 ns | 2.69 ns | 1.27 | 0.02 | 0.0629 | - | 1.03 KB | 1.00 |
| SpanStackPool | .NET 9.0 | 291.1 ns | 5.08 ns | 4.75 ns | 1.00 | 0.02 | 0.0629 | - | 1.03 KB | 1.00 |
You can see some enormous differences; StringBuilder, which is actually so performant in many situations, is much slower and generates many more allocations. Regex, on the other hand, is not as bad as you might think. But as you would expect, the various Span variants are all very far ahead in terms of performance - so it's good that the new Span API is really easy to understand and use.
However, if you need the very best performance, you won't be able to avoid using AVX2, for example meziantou's Replace characters in a string using Vectorization implementation.
Sustainable Code
You can find this and many more examples on my GitHub under Sustainable Code.