Load Google Search Crawler IP Address Ranges with Refit

Load Google Search Crawler IP Address Ranges with Refit

Load Google Search Crawler IP Address Ranges with Refit

Rate Limit is a great tool to protect your own website and content from misuse; in some cases, however, rate limiting is also bad: for example, if you want your own content to be indexed by Google Search in order to increase your own visibility.

According to the documentation, Google offers various options for this; however, the most secure is the recognition of the search crawler by the IP address, as this is virtually impossible to falsify or can only be falsified with a great deal of effort.

Google Bot Json

Google offers a static Json file for this purpose, in which Google regularly publishes the IP address ranges used by the search crawl bots. The Json format has a basic structure that looks like this:

 "creationTime": "2024-11-26T15:46:03.000000",
    "prefixes": [
        {
            "ipv6Prefix": "2001:4860:4801:10::/64"
        },

where besides ipv6Prefix also ipv4Prefix can be defined.

This json can be defined very easily as C# record classes:

public sealed record class GoogleSearchCrawlerIPAddressResult(
    [property: JsonPropertyName("creationTime")] DateTimeOffset CreationTime,
    [property: JsonPropertyName("prefixes")] List<GoogleSearchCrawlerIPAddressRangeItem> IPRanges);

public sealed record class GoogleSearchCrawlerIPAddressRangeItem(
     [property: JsonPropertyName("ipv6Prefix")] string? IPv6,
     [property: JsonPropertyName("ipv4Prefix")] string? IPv4);

Refit

To make this as easy as possible to consume using C#, Refit is a good option.

To do this, Refit must be stored as a NuGet package in the project files.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net9.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Refit" Version="8.0.0" />
  </ItemGroup>

</Project>

A refit interface can then be created with the end point:

public interface IGoogleSearchCrawlerIPAddressRangesHttpClient
{
    [Get("/static/search/apis/ipranges/googlebot.json")]
    Task<GoogleSearchCrawlerIPAddressResult> GetRanges(CancellationToken cancellationToken);
}

We can now use the interface to create an instance with Refit and load data:

IGoogleSearchCrawlerIPAddressRangesHttpClient googleSearchBotIPAddressClient = RestService
    .For<IGoogleSearchCrawlerIPAddressRangesHttpClient>("https://developers.google.com");

GoogleSearchCrawlerIPAddressResult result = await googleSearchBotIPAddressClient.GetRanges(CancellationToken.None);

You can now work with the data; for example, simply output it:

foreach (GoogleSearchCrawlerIPAddressRangeItem entry in result.IPRanges)
{
    if (entry.IPv4 is not null)
    {
        Console.WriteLine($"> Found IPv4 range: {entry.IPv4}");
    }
    if (entry.IPv6 is not null)
    {
        Console.WriteLine($"> Found IPv6 range: {entry.IPv6}");
    }
}

Full example:

using System.Text.Json.Serialization;
using Refit;

Console.WriteLine("Loading IP Addresses from Google Search Bot Json");

IGoogleSearchCrawlerIPAddressRangesHttpClient googleSearchBotIPAddressClient = RestService
    .For<IGoogleSearchCrawlerIPAddressRangesHttpClient>("https://developers.google.com");

GoogleSearchCrawlerIPAddressResult result = await googleSearchBotIPAddressClient.GetRanges(CancellationToken.None);

Console.WriteLine($"Found {result.IPRanges.Count} entries from {result.CreationTime:o}.");

foreach (GoogleSearchCrawlerIPAddressRangeItem entry in result.IPRanges)
{
    if (entry.IPv4 is not null)
    {
        Console.WriteLine($"> Found IPv4 range: {entry.IPv4}");
    }
    if (entry.IPv6 is not null)
    {
        Console.WriteLine($"> Found IPv6 range: {entry.IPv6}");
    }
}

Console.WriteLine("Finished.");


// Refit definition

public interface IGoogleSearchCrawlerIPAddressRangesHttpClient
{
    [Get("/static/search/apis/ipranges/googlebot.json")]
    Task<GoogleSearchCrawlerIPAddressResult> GetRanges(CancellationToken cancellationToken);
}

public sealed record class GoogleSearchCrawlerIPAddressResult(
    [property: JsonPropertyName("creationTime")] DateTimeOffset CreationTime,
    [property: JsonPropertyName("prefixes")] List<GoogleSearchCrawlerIPAddressRangeItem> IPRanges);

public sealed record class GoogleSearchCrawlerIPAddressRangeItem(
     [property: JsonPropertyName("ipv6Prefix")] string? IPv6,
     [property: JsonPropertyName("ipv4Prefix")] string? IPv4);