Most of the last week I spent on adding event log into IRC log web page, as well as adding the ability to select message blurbs. Currently, the value is hardcoded to 30 minutes window within the event. So, when I click on the item in the event log, the messages within 30 minutes of the event are highlighted
Monthly Archives: June 2008
Here is a summary of what’s been done so far and what needs to be done next in the next few weeks:
TO DATE:
- Created backend Elixir code to save IRC messages and events to the database.
- Created Supybot plugin to log messages to the database
- Created a new DrProject component for viewing IRC logs
So, now the functionality to view IRC logs by date is done.
TO DO:
- Add event log to the component, so the users can browse segments of IRC logs by events. First, I’ll create a screen mockup for this.
- Add search functionality for the logs.
- Add tags to conversations. The idea is to use words that don’t appear in an English dictionary (i.e. ComponentA) as tags. For example, users could then view conversation blobs around ComponentA by clicking on the tag.
- This Wednesday me and Greg are meeting with Gerald Penn, who is doing research in Natural Language Processing. Hopefully he’ll point us in the right direction on any algorithms for dissecting IRC messages by subject or keywords.
This is the first screencast made by yours truly about the progress for this project. The idea behind the screencast is that we want to show you the current functionality without having you to check out the code and configure everything to see what’s happening. I’ve uploaded it on youtube: Again, any comments are very welcome!
EDIT: here are screenshots with an updated interface:
P.S. That’s not real Greg
Jeremy Handcock, a CS Master’s student here at UofT has sent me a paper that explains how full-text search indexing source code repositories, bug descriptions, documents, etc. could help developers find a rationale behind a portion of a code. The way it works is pretty simple. The entire system is represented by a directed graph, with nodes representing artifacts such as URL, e-mail, bug number, e-mail address, etc., and arcs specifying where each artifact is mentioned. For example, a bug can be mentioned in a blog, bug tracker, e-mail, or anything else, so it will link to those artifacts.
The crawler scans data sources (repos, bug trackers, etc.) for artifacts. It then updates a directed graph with links between nodes (i.e. a new e-mail mentioned existing bug), so it adds that e-mail to the graph and creates a link between a bug and the e-mail.
The concept seems very simple, but the details are complicated. For example, the scanner for bug artifacts would have to capture messages like “bug#1234″, “the SQL query has been fixed in ticket 1234″, “see 1234 for the updated code”, etc. That’s kind of what I’m thinking about right now. For my project, I need to do just that. I need to find a way to look at the IRC log and say “messages 123 to 456 were about X and messages 789 to 1000 were about Y”. I already looked at several other papers that talk about this. It gets complicated really fast. For example, messages 123 to 456 could be about subjects X and Y, and Y could be repeated in messages 789 to 1000. I’m going to have a chat with another grad student who has been doing natural language processing to find out how feasible topic extraction is for IM logs.
I’m also working on the UI for viewing logs. I created a DB schema for IRC messages and events, a plugin for Supybot to snoop on channels and update the DB with messages and events, and also a basic interface for viewing chat logs by dates. The next step is to be able to search and navigate IRC messages using DrProject search and event log, respectively, as I mentioned in the earlier post. I’m going to post screenshots with explanation of the current UI this week, and any comments would be great. I’m planning to make browsing by event log as easy as possible to the user, and the search as powerful as possible, so maybe we won’t have to segment IRC messages by topic. Eventlog by itself is a pretty good way of segmenting messages. If A happened at time T (checking, bug report, etc.), and B was discussed at the same time on IRC, then most likely A and B are related.
P.S. I rediscovered in practice that words “Windows” and “open source” mentioned in the same sentence render that sentence meaningless. A piece of software as complicated as DrProject with its many open source plugins just can’t run 100% correct on Windows. There’s many reasons, but they’re all explained by the fact that it was coded with Unix/Linux in mind (although it does have checks for Windows OS).
P.P.S. I never used VIM for coding before. I tried earlier this week, and found that learning how to use it effectively is as useful as learning how to drive in manual when most of the time you’re using a car to get from A to B. Eclipse might not be perfect, but I don’t buy the argument that using Linux commands with VIM will make you a more effective programmer.
… proved to be much more tedious than I expected, but not too tedious *if* you know what you’re doing (i.e. there exists documentation).
First things first. DrProject needs to know which modules it has to load when it starts up. These are specified in entry_points dictionary in setup.py:
‘irc_model = drproject.irclog.model’
‘irc_webui = drproject.irclog.web_ui’
I have to include model and web_ui because in model i have Elixir code to create a table for IRC messages, and in web_ui I have IRCLogController class for handling user requests. So, DrProject needs to know both. When it starts, it loads these classes. In model.py, I have IRCMessage which inherits Elixir’s Entity class, and creates a mapping for the table irc_message in DB.
web.chrome provides a few useful interfaces. For me, I had to implement ITemplateProvider in my IRCLogController class. This template has only one method, which is get_templates_dirs. The dispatcher then calls this method to add a directory in which the templates are stored to its list. Another way is to add a template to drproject/templates, but I think it’s better to have a template in my component.
As I become more familiar with drproject and its components, I’ll add more details and create a wiki page on DrProject listing steps for adding new components. So far, I’ve created a schema for storing messages and the bare bone template to irclog component in DrProject. Next, I’ll be modifying the Supybot plugin to save messages to this table and adding tests for backend. After this, I’m going to start working on the IRC logs template based on the screen mockup I posted earlier.


