Colin Luoma's Blog

Tag

Showing blog posts sorted under the tag: R

Historical Linux Statistics from Steam's Hardware & Software Survey

By Colin Luoma, 2020-05-14 19:47:19,

Steam is a popular gaming storefront and platform on Window, macOS, and Linux. Every month they publish their Hardware & Software Survey with overall summary statistics of their users. The data can also be segregated by operating system.

I wanted to see how the share of Linux has changed over time on Steam. Unfortunately, the survey data is only ever available for the previous month. So I wrote a small R script (GitHub link) to scrape historic survey results from the internet archive's Wayback Machine and current data directly from Valve. A few months were missing from the Wayback Machine, which was a bummer, but enough data was available to get a feel for how the metrics have changed over time.

Linux User Percentage

The drop in Linux user share in late 2017 to early 2018 was due to a combination of factors. First was a counting error in the survey that Valve admitted to and later fixed. The error resulted in over-inflated user numbers from net cafes. Additionally, this time period was the peak player count for the hugely successful title PlayerUnknown's Battlegrounds. PUBG brought a lot of new players to the Steam platform from regions where net cafes are a popular way to play games. Both of these factors combined to substantially deflate the Linux user share on Steam.

Because of overall growth of Steam however, a drop in Linux share does not necessarily mean an absolute drop in Linux players. In Steam 2019 Year in Review, they mention and monthly active user count of almost 95 million. That equates to about 850k monthly Linux players during 2019.

Linux User Percentage by OS Language

Restricting to reported OS language shows some interesting results with regards to the Linux share on Steam. The percentage of Linux users, of those with an English language OS, is around twice that of the general population. It's unclear whether this difference is due more English speakers preferring Linux, or more Linux users preferring English. But it's a surprising difference non-the-less.

Processor Preference of Linux Users

Since the release of AMDs Ryzen CPU line in early 2017, more and more Linux users have been foregoing Intel processors in favour of AMD. However, Intel still has a clear market lead.

GPU Vendor Preference of Linux Users

AMD is taking up ground in the GPU space among Linux users on Steam. The results of AMDs open source initiative began to bear fruit in 2017/18 as game performance approached some of Nvidia's offerings.

Despite a closed-source video driver, Nvidia still remains the main choice among Linux users.

Most Popular Linux Distros

Steam (unfortunately) does not report many different Linux distributions, preferring to group most in the 'Other' category.

Ubuntu remains the most popular Linux distribution on Steam and many Linux games specifically target Ubuntu as a supported OS. This has the effect of generally being the smoothest experience for new users.

Outside of Ubuntu, there is a great variety of Linux distributions, many of which will also have no issues running games on Steam.

In addition to the snapshot of data above, I've setup a page on my Linux gaming blog with the same charts that are updated automatically, with an R script, whenever new data becomes available.

Tags: R gaming linux stats steam valve

Overlaying Frames-per-Second on a Benchmark Video Using R, ffmpeg, and Kdenlive

By Colin Luoma, 2019-11-16 18:49:57,

Feral Interactive is a UK-based porting house that specializes in bringing Windows games to other platforms like Linux and macOS. One of their most recent projects was bringing the game Shadow of the Tomb Raider to Linux. I wanted to compare the performance of their native Linux version of the game versus running it in Linux using a popular compatibility layer called Wine. Running games on Linux with Wine often incurs some performance cost compared to Windows so there is still a market for native Linux ports that can recover some of that lost performance.

Conveniently, Shadow of the Tomb Raider contains a built in benchmark tool that will spit out its results to a text file where it can then be analyzed with R. The raw data looks a little like this:

  frame  time delta memory
  <int> <dbl> <dbl>  <dbl>
1     1   0     0     2341
2     2  14.4  14.4   4462
3     3  35.7  21.3   4462
4     4  53    17.3   4462
5     5  72.1  19.1   4462
6     6  91.6  19.5   4462

Frame is the id of the current frame, time is the milliseconds since the start of the benchmark, and delta is the amount of time it took to draw the frame. Most gamers don't really care about these numbers though; the most relatable metric is frames-per-second which is the number of frames that are able to be drawn in one second. To calculate this I just look a the time it took to draw the previous 50 frames, then 50 divided that time is the rolling FPS.

With FPS calculated, it's easy to use R and ggplot2 to make a nice graph showing the performance of the benchmark over time.

That's neat, but what I really wanted was to overlay the chart over footage of the actual benchmark so that people could see how different in-game scenes effect the frames-per-second. To do this I used a few tools: R again for the chart generation, ffmpeg to turn pictures into a video, and then Kdenlive to edit the video.

Generating Charts:

To embed a moving chart in a video, I used R and ggplot2 to generate 1 chart per video frame. That works out to 4000 individual charts due to the benchmark being 160 seconds long and wanting 25 frames per second. Each new frame shifts a window showing the next 1/25th of a second of data and 10 seconds worth data over the whole image.

To make things look a bit nicer in the final video, the background of the charts had to be a colour that could easily be chroma keyed out. Chroma keying can remove a certain colour from a video layer, basically green screening. So all 4000 charts looked something like the following beautiful image.

Turning Charts into a Video:

Thankfully, turning a series of images into a video is rather common problem and there are a lot examples online of using ffmpeg to do this conversion. So I shameless borrowed the following command to turn all 4000 charts into a video. I won't pretend to know what all of the arguments do, but importantly it is set to 25 frames-per-second to match the timing of the generated charts. Without this the scrolling chart would be too fast or too slow and would not line up with the benchmark footage.

ffmpeg -r 25 -f image2 -start_number 1 -i plots/fps_%d.png -vcodec libx264 -profile:v high444 -crf 0  -pix_fmt yuv420p sottr_fps.mp4

Overlay FPS Video on top of Benchmark:

Kdenlive is an open-source video editor for Linux. Video editing is still one of the areas of desktop Linux that is still a bit lacking, but Kdenlive crucially has a chroma key feature which is the key component in this step. The video generated from the bright green charts is overlayed on the footage of the Tomb Raider benchmark then the chroma key is applied.

In this screenshot you can see the chroma key effect being applied to the bright green of the chart video. It removes the background and turns it into a very nice looking overlay.

So that's it. I really enjoyed this little project because it was the combination of several tools (R, ffmpeg, and Kdenlive) that really made it possible. Each had a specific task and it all came together nicely.

Check out the final result on YouTube.

Tags: R benchmarking ffmpeg kdenlive video editting

Bitcoin Prices and Hidden Markov Models

By Colin Luoma, 2018-02-12 15:02:05,

Lately, there’s been a lot of interest in Bitcoin, probably sparked by its almost unbelievable growth in December 2017. However, this past week, we saw the price of Bitcoin drop the just above $6000 which was the lowest it has been since November 2017. So I wanted to take a closer look at Bitcoin prices through the lens of Hidden Markov Models (HMM) to see what conclusions, if any, can be drawn.

Hidden Markov Models are similar to a standard Markov chain model but the where the current state is unknown. Instead of observing the actual state of the process, the only information available is the realization of some other output that is dependent on the current internal state. A somewhat contrived example would be trying to detect whether it is raining, or not, based on how many people you see with umbrellas. The hidden, unobservable state is the weather (raining or not) while the observable, realization of that state is the proportion of people carrying umbrellas (more people carry umbrellas if it’s raining).

Applying this concept to Bitcoin prices, there could be some internal state driving the change in price and different states produce different expected price changes. I assumed that the daily change in price followed a Log-normal distribution, which means that taking the logged value of daily returns should be normally distributed. This made the model slightly easier to interpret. I also used 3 internal states in an attempt to capture bear and bull states with differing volatility.

Below is a chart showing the most likely states during the 2017 and into the 2018 calendar years:

Here each of the three states are coloured. The blue state was characterized by positive average returns and low volatility. The red state also had generally positive returns but higher volatility. Finally, the green state had mostly negative returns and also high volatility.

I also ran a quick Shapiro-Wilk test on the log-valued daily returns which was unable to reject the null that daily returns come from a normal distribution. This means that there wasn’t enough evidence to disprove the assumption that price changes follow a Log-normal distribution.

This is all fine and good, but what would be really cool is if the fitted model could be used to predict the future price of Bitcoin. So I ran 10,000 30-day simulations to get an expected future price and a confidence interval. This is what it looks like:

This shows the predicted Bitcoin price, and the actual price change during the prediction interval. The shaded regions also represent the 95% and 80% confidence intervals, based on the 10,000 simulations. In this instance, the HMM was not exactly a great predictor. Bitcoin has been incredibly volatile and I think it’s extremely difficult to make any meaningful predictions using closing price alone.

If you’re interested in taking a closer look at the R code used to fit the HMM model and generate the charts, you can find it on my Github.

Tags: R bitcoin finance hidden markov models

Canadian 2016 Census - Population and Dwellings

By Colin Luoma, 2017-03-11 16:49:14,

View everything here

About a month ago, Statistics Canada finally started releasing summary statistics for the 2016 census. The long-form census was re-introduced last year so over the course of this year there should be lots of interesting data to look through. As of right now, only information on population and dwelling counts has been released with age and sex demographics scheduled for the beginning of May 2017.

I wanted to play around with a couple new tools like leaflet and highcharts and the census population data was the perfect test dataset. Leaflet is an awesome mapping library that feels really snappy in a browser and the R wrapper is incredibly simple to use. I definitely recommend it for any type of geographic visualizations. Flexdashboard was used to create a single-page html file that I then hosted on my webserver.

I don't have much to say about the data since I'm not really in a position to make any kind of conclusions. It's mostly just interesting to see how things have changed in Canada over the past 5 years.

Tags: R canada census

haloR - A New R Halo API Wrapper

By Colin Luoma, 2017-02-24 21:52:44,

Earlier this week Microsoft released Halo Wars 2, a followup to the original that has somewhat of a cult following. In contrast to the mainline Halo titles, Halo Wars is a real-time strategy game. I've played through the Halo Wars 2 campaign and dipped my feet a little bit into the multiplayer. It isn't really for me (I prefer FPS) but it was still an enjoyable experience.

Similar to Halo 5, Microsoft and 343i have decided to open up much of the game details to the public through their Halo API. I really enjoyed digging through Halo 5 data and it was a big engagement point for my interest in the game. Kudos to MS/343i for the work they do on this stuff.

Even though I don't plan to continue playing the game, I decided to update my Halo R API wrapper to now include functions to easily get data from the Halo Wars 2 endpoints. Installation instructions and a tiny example can be found on the haloR Github.

Before using it, I suggest reading through the documentation provided my 343i since the documentation for my package is kind of sparse and the returned objects can be a little bit cryptic without a reference.

And as an additional small example, I pulled some Halo Wars 2 data for the game mode 'Rumble'. This is a new mode where players have infinite resources and don't have to worry about their economy. I wanted to see which leader had the highest winrates so I pulled a bunch of matches and graphed their percentages (at the top of this post). It's interesting that the two main story characters, Cutter and Atriox, have the highest winrates.

Tags: R api halo