Case Study: SportSpyder

SportSpyder is powered by a complex web crawler and search engine that are behind an easy to use interface.

Challenges

The technology behind the SportSpyder.com website posed some unique challenges. The goal was to create a realtime search engine for sports articles as they were published across various sports websites.

SportSpyder consists of both a public website along with an administration interface. The vast majority of SportSpyder was written using Ruby on Rails.

Implementation

Custom Web Crawler

A custom web spider was written in Ruby that continuously crawls over 1,500 news sources to collect the daily sports news. The spider fetches and indexes over 17,000 articles each week.

Administration

The web spider administration interface was written to allow for easy detection of RSS feeds in source websites and includes customizable options to discover headline patterns in websites that don't offer RSS. The interface accounts for activity on each crawled website to help eliminate obsolete sites.

Headline Formats

Articles are displayed in a multitude of formats. They can be viewed by team, player, or a custom fantasy team of players. Each team's list of headlines can be filtered by what type of website they come from: mainstream, independent, or a specific website only. Articles are de-duplicated and grouped together by source for each day.

Amazon Product Team Stores

SportSpyder takes advantage of Amazon.com's REST webservice API to pull down a full list of categorized items available for each team. It uses this data to presents a custom storefront each team to bring in revenue.

Lucene Based Search

The search engine in SportSpyder is a extended implementation of the Apache Lucene full-text search engine. It is written in Java using XML-RPC to communicate with the Ruby on Rails application. Both the news articles and store items for each team are full-text searchable.

Comment System

SportSpyder has a comment system that is unique in that it combines the concept of both blog comments and forum posts. Comments can be added to each headline, but since headlines often move off the front page quite quickly, a forum style view is available to keep track of recent articles with comments.

Fantasy Teams

SportSpyder provides an interface for users to create a custom team of players. The user can thereafter view a composite collection of the articles that mention any players on their team.

View other featured work →