How Many .com Domain Names Are Unused?

Views:2885 Time:2019-02-21 18:30:36 Author: NiceNIC.NET

When looking for .com names, I've been frustrated by how many are already taken but appear to be unused. It can feel like people are registering every pronounceable combination of letters in every major language, and even the unpronounceable short ones. Is there rampant domain speculation, or do I just think of the same names as everyone else? Let's look at the data...

 

There are currently 137 million .com domain names registered.1 Of these, roughly 1/3 are in use (businesses, personal websites, email, etc.), another 1/3 appear to be unused, and the last 1/3 are used for a variety of speculative purposes.


 

How I Determined These Numbers

I started by crawling a random sample of the domains from the top-level .com DNS zone file, until reaching 100,000 valid domains.

 

For each domain, I collected the following:

 

the WHOIS record

all DNS records for the top-level domain and the www subdomain

HTTP and HTTPS responses (status code, headers, and bodies) for the root page of the top-level domain and the www subdomain

screenshots of the root page as viewed by Mozilla Firefox 64.0 on Linux

The crawl took a little over 48 hours from a single server located in a Singapore data center. I ran a follow-up crawl for any domains that failed to connect over HTTP or HTTPS (in case of transient errors). And finally, for the 2,188 domains to be categorized I manually checked any that had failed in case the crawler had timed out or had DOM events blocked by JavaScript.

 

Then, I wrote a script to help me categorize websites based on their screenshot and body. The categorization script presents the possible categories as a list of buttons, with Content being the default.

 

I used the script to categorize domains over the next 2 days. In some cases the screenshot and body were not sufficient, so I manually opened the domain in a web browser for inspection.

 

Domain Categories

 

These categories evolved as I worked. For example, I hadn't anticipated the high number of gambling domains (aliases).

 

For most categories I've included a random sample of screenshots from that category, excluding redundant ones.

 

Content (31% or ~43 million)

Content is the category of any domain with a website displaying unique content. It doesn't matter what the content is, as long as it appears to be unique for the domain and publicly accessible. When I was unsure, I placed domains in this category by default.

 

No Web Server (11% or ~16 million)

If I was unable to connect to, or receive a valid response from, port 80 or 443 for either the top-level domain or the www subdomain and the domain had no MX records, I placed the domain in this category. Some of these domains likely have some non-web use, such as an FTP or video game server, but I expect them to be a small fraction. Additionally, the crawling server was only configured for IPv4, so any IPv6-only websites would have been grouped here.

 

Empty (9.2% or ~13 million)

An Empty domain is one for which a web server is answering requests, but returning empty pages, 404s, or unfilled templates (such as default WordPress installs).


 

The difference between an Empty domain and a Parked domain is that the Empty domain has presumably been configured by the user, but no content has been added yet.

 

For Sale (7.1% or ~9.8 million)

Many domains are listed For Sale, usually by domain investors, through various brokers and marketplaces. Nearly half of this category appears to be domains sold by HugeDomains, although their website lists only "over 200,000" domains available for purchase (a fraction of their ~4 million domains if the sample is representative). I only included domains from recognizable marketplaces or when the contact details were were not part of an ad placement, as ad networks and domain brokers will often falsely claim that they represent a domain owner (I categorized all such domains as Ads instead).

 


Error (5.7% or ~7.9 million)

If a domain returned any type of error, whether an HTTP error or an in-page error, it belongs to this category.

 


Note that I might have miscategorized some Private domains as Errors if they used basic authentication, as I did not distinguish between 403 Forbidden (due to no basic auth credentials) and other errors.

 

Parked (4.8% or ~6.5 million)

Parked domains are those that display a page from the registrar or host explaining that the domain has not been set up yet. To qualify as Parked, a domain had to serve a page without any external ads. It could advertise its own services, but it couldn't place ads from an ad network.


 

Gambling (3.0% or ~4 million)

All websites in this category are in Chinese and are operating under aliases, often short strings of numbers or consonants (e.g. 17770012 or tdwhtr). They follow common templates and contain similar images, often with automatically-generated logos. I assume their purpose is to attract people who think the names are lucky.


 

Mail (2.6% or ~3.5 million)

Any domain not in any other category, but with MX DNS records (for email), I categorized as Mail. I did not attempt to see if the mail server was working or if delivery was possible. It's possible that many of these domains are not actually used for email, but I've given them the benefit of the doubt.

 

Redirect (1.1% or ~1.6 million)

Redirects include vanity domains pointing to Facebook pages, alternative names for businesses, etc.

 

Private (0.64% or ~0.9 million)

Private domains did not appear to have any content accessible without first logging in (or in some cases registering).

 


Porn (0.59% or ~0.8 million)

Similar to gambling websites, a number of pornographic websites operate under various aliases. The websites were predominantly in Chinese and the domains followed similar naming patterns. As many of the sites display pornographic material directly (not after a warning), I've not included the screenshots here.


By Christopher Forno at singapore data company

  • Follow
  • facebook twitter Pinterest blogspot VK
  • Address
  • Room 1704 Hang Lung Center
    Paterson Street, Causeway Bay, Hong Kong
    support###nicenic.net (change the ### to @)
Copyright © 2005-2024 NICENIC INTERNATIONAL GROUP CO., LIMITED All Rights Reserved