How to Query the StackOverflow Database

StackOverflow is a brilliant resource, I’ve seen people complaining recently about a snarky atmosphere on there which makes it difficult/unpleasant to get questions answered, but the sheer volume of information it holds makes it an incredibly useful tool in any programmers toolbelt.

I’ve been programming for quite some time now and it’s hard to imagine life without Stackoverflow, it certainly seems a long time ago that I was trying to find things in Deja News.

Recently I wanted to list a specific users answers sorted by the number of times the question they answered had been viewed.

There are a number of different ways to query Stackoverflow, they provide an API, you can also download snapshots of their database to query locally.

Probably the quickest way to get results is via a web interface that they provide to send SQL queries to their database, you can write a query by going here – https://data.stackexchange.com

Then choose the site you want to query from the drop-down

On the next screen, you’ll be able to enter your SQL query, the site shows you the database schema on the right of the screen so it’s reasonably easy to work out how to compose your query. Here’s my SQL for the query mentioned above

Let’s break this down a bit

  • The StackOverflow database stores question and answers in the same “Posts” table but only questions have view data, answers are linked to questions via their parent id (the answer’s parentid will be the question’s id).
  • In order to get the answer and the question data, we join the “Posts” table to itself using parentid and id columns, we give the two “Post” table instances aliases of “questions” and “answers”
  • We add a where clause to filter our results by a specific user id and a posttypeid of 2 (this corresponds to answers). Specifying a parameter prefixed and suffixed by two hashtags means the data StackExchange site will prompt you for a value  for the parameter before running the query
  • We SELECT out the title of the question, the answer’s text, and the question’s view count. Note that by selecting “questions.Id as [Post Link]” the Data StackExchange site will automagically return a link to the post in the result set.

You can read more about how to compose queries here.

If we now run our query we should see something similar to the below –

You can see how the question’s title has been nicely formatted as a link in the results. You can also see that the results can be downloaded as a CSV, so if you do need to parse out values/information from the question or answer text, whilst there is only so much you can do in SQL, it should be pretty easy to write a script that would parse through the values in the CSV file.

Saving Your Query for Future Use

Once you have your completed your query then StackExchange allows you to generate a permalink so you can easily access the query or share it with other people, here’s the link for my query –

https://data.stackexchange.com/stackoverflow/query/1046349?UserId=76564

The cool thing here is that StackExchange includes any query parameters as parameters on the URL, so if you wanted to run the query for another user all you would have to do is change the query string parameter. For example, the query below will return the answers and view counts for Jon Skeet’s StackOverflow answers rather than mine –

https://data.stackexchange.com/stackoverflow/query/1046349?UserId=22656

Seems that Jon’s answers have had a few more views than mine :).

Further Reading

As mentioned earlier in the post there is an excellent guide to writing queries on Data Stack Exchange site – https://data.stackexchange.com/help

This a good guide from the Stack Exchange Meta Site that goes through the table schemas in more detail – https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede

Leave a Comment