Skip navigation

Jeremy Handcock, a CS Master’s student here at UofT has sent me a paper that explains how full-text search indexing source code repositories, bug descriptions, documents, etc. could help developers find a rationale behind a portion of a code. The way it works is pretty simple. The entire system is represented by a directed graph, with nodes representing artifacts such as URL, e-mail, bug number, e-mail address, etc., and arcs specifying where each artifact is mentioned. For example, a bug can be mentioned in a blog, bug tracker, e-mail, or anything else, so it will link to those artifacts.

The crawler scans data sources (repos, bug trackers, etc.) for artifacts. It then updates a directed graph with links between nodes (i.e. a new e-mail mentioned existing bug), so it adds that e-mail to the graph and creates a link between a bug and the e-mail.

The concept seems very simple, but the details are complicated. For example, the scanner for bug artifacts would have to capture messages like “bug#1234”, “the SQL query has been fixed in ticket 1234”, “see 1234 for the updated code”, etc. That’s kind of what I’m thinking about right now. For my project, I need to do just that. I need to find a way to look at the IRC log and say “messages 123 to 456 were about X and messages 789 to 1000 were about Y”. I already looked at several other papers that talk about this. It gets complicated really fast. For example, messages 123 to 456 could be about subjects X and Y, and Y could be repeated in messages 789 to 1000. I’m going to have a chat with another grad student who has been doing natural language processing to find out how feasible topic extraction is for IM logs.

I’m also working on the UI for viewing logs. I created a DB schema for IRC messages and events, a plugin for Supybot to snoop on channels and update the DB with messages and events, and also a basic interface for viewing chat logs by dates. The next step is to be able to search and navigate IRC messages using DrProject search and event log, respectively, as I mentioned in the earlier post. I’m going to post screenshots with explanation of the current UI this week, and any comments would be great. I’m planning to make browsing by event log as easy as possible to the user, and the search as powerful as possible, so maybe we won’t have to segment IRC messages by topic. Eventlog by itself is a pretty good way of segmenting messages. If A happened at time T (checking, bug report, etc.), and B was discussed at the same time on IRC, then most likely A and B are related.

P.S. I rediscovered in practice that words “Windows” and “open source” mentioned in the same sentence render that sentence meaningless. A piece of software as complicated as DrProject with its many open source plugins just can’t run 100% correct on Windows. There’s many reasons, but they’re all explained by the fact that it was coded with Unix/Linux in mind (although it does have checks for Windows OS).

P.P.S. I never used VIM for coding before. I tried earlier this week, and found that learning how to use it effectively is as useful as learning how to drive in manual when most of the time you’re using a car to get from A to B. Eclipse might not be perfect, but I don’t buy the argument that using Linux commands with VIM will make you a more effective programmer.


One Comment

    • Jeff Balogh
    • Posted June 12, 2008 at 1:03 pm
    • Permalink

    > but I don’t buy the argument that using Linux commands with VIM will make you a more effective programmer.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: