Loading...

Is Smogon Dying? Data Scraping and Analysis

By Bhushan Thumsi • 2024-09-26
Using Python to scrape Smogon's monthly usage stats for the past 10 years and analyzing the data.

Smogon is a competitive Pokemon website founded in 2004 and has been the hub for competitive Pokemon since. It's a community driven site with a lot of different sections. I tracked the number of battles played on Smogon's affiliate, Pokemon Showdown, to compare and contrast the growth and decline of battle formats on the site. Namely, I wanted to answer the question:

Is Smogon's usage as a whole declining?

Smogon publishes their monthly battle data, including number of battles played and Pokemon used in those battles. However, they do not expose this data via an API. Instead, I had to scrape it from the HTML tables on their website.

The first step, was to write a scraper to automate the efficient extraction of monthly data from 2014 (the first year they started tracking and publishing data) to 2024.

The Extraction

Smogon data is stored as:

smogon.com/stats/year--month/format-rating.json

With over 100 formats per month, that scales quickly over a time horizon of 10 years. The main bottleneck isn't the literal processing of tens of thousands of files -- that itself is trivial -- but the I/O of opening webpages and reading the data. The obvious solution is parallelizing the scraping and reading multiple files at once. However, that runs into the issue of hitting rate limits. When you open that many pages at once, at some point the website will block you as a form of DDOS protection. There are workarounds, such as setting up automatic retrials and using proxies, but ultimately the decision was made to use a single thread to scrape the data.

def process_json(json_info, retries=3):
    chaos_url, json_file, month_link = json_info
    tier = json_file.split('-0')[0]
    year, month = month_link.strip('/').split('-')[:2]  # Get only year and month

    json_url = chaos_url + json_file

    for attempt in range(retries):
        json_response = requests.get(json_url, timeout=10)
        json_data = json.loads(json_response.text)
        # Extract the number of battles
        num_battles = json_data.get('info', {}).get('number of battles', 0)
        return {
        "Tier": tier,
        "Month": month,
        "Year": year,
        "#_of_battles": num_battles
        }
    return None

The data was extracted to a Pandas dataframe, which is a data structure that is well suited for this kind of tabular data, using a Python Jupyter notebook. This includes some trivial data cleaning such as dropping rows with missing values and renaming columns to be more descriptive. Additionally, i added a IsCurrentGen boolean column to identify which formats were the current generation at the time. For example, a format called gen6ou was a "Current Gen" format in 2016, but not in 2019.

ID Tier Date # of Battles IsCurrentGen
2521 gen6smogondoubles 2014-11 68,988 True
1103 gen5ou 2014-11 868 False
1673 gen6cap 2014-11 1,628 True
2269 gen6pu 2014-11 16,730 True
2116 gen6ou 2014-11 737,681 True
Simplified dataframe to illustrate the relevant data columns

Results

Full results are available in this full YouTube video outlining my thoughts, but analysis has is available here as well. Graphs and charts were made using the matplotlib library in Python.

VGC vs OU Battle Counts Over Time

On comparing the number of battles of VGC vs OU, we can see that in recent years, VGC has overcome a defecit of millions of games a month. As a caveat, there are some fundamental reasons why VGC performs better in a measure of number of battles (battles are shorter), but the fact that the difference is now in the millions is a testament to drastic changes in the landscape. There are a number of reasons why VGC has grown over the last few years. WolfeyVGC, easier mechanics, and the funding of The Pokemon Company are all major contributors to the astronimical growth. A grassroots movement like SMogon would realistically never compare to an industry funded project. It was only a matter of when, not if. That being said, the question isn't if VGC grew, but if Smogon declined. There are crossovers, like people moving from Smogon to VGC, but it's important to dig just a little bit deeper.

Current Gen OU VGC

Current generation OU, the flagship format of Smogon, is on a verifiable decline. I'm not particularly persudaded by claims of the "meta being bad". There are certain fundamental truths: the game is not built for smogon so that means there is no force balancing the metagame like there is for VGC. However, unlike VGC, Smogon acknowledges this truth and places bans to make the metagame more balanced. I've heard "Meta bad" every generation since I've played, and hearing "Meta bad" in generation 9 as well, is not especially persuasive. There is a more fundamental reason.

Generation 3 OU

A tangent here, Generation 3 OU, is skyrocketing while current generation OU is declining. This is interesting, and it brings me to my core hypothesis. It's not that Smogon is dying, it's that the site is being distributed into more and more formats as more and more formats get released. Every generation, there is a new format that is the "meta", and enjoyed of old formats either move to the new one, or stick around the old ones. Inherently, when there are more formats, there are going to be more fractured subcommunities. If a glass of water gets split into more and more smaller and smaller glasses, the total volume remains the same. However, the water is distributed into more and more small glasses, and the appearance is that the water is disappearing.

National Dex OU

In particular, dexit and the Natdex OU format, is a huge creator of this illusion. Natdex OU "steals" a significant portion of the OU playerbase from current generation OU. This is not something wrong, in fact it's a strength. People will play what they want to play.

Conclusion

Is Smogon Dying?

No, Smogon is not dying. Instead, as time goes on, the site is being distributed into more and more formats as more and more formats get released. It may give the appearance that the website is "dying" but it is simply the case of a playerbase being split up into more and more avenues, the total remains the same. However, it could be argued that the fact that the site has stayed the same is a bad thing not a good thing. It should've been growing with the playerbase, not staying the same. However, that analysis is out of the scope of this article and may be the topic of a future article.