$ A semantic tool on our chat logs

by mabynogy | 2018-10-16

I recently met Sébastien who is the sales person of Semdee, a small company writing a tool to extract insightful infos from raw textfiles.


The tool works a bit like Lucene. It uses contextual word similarities to extract facets (group of words in the same field). They are called clusters in semdee. semdee is language agnostic and can take any format as input.

The user can navigate in the data by clicking words in tag clouds.

Sébastien ran Semdee on the IRC logs generated by itsumi (the first bot of the chan made by satou). We have around 6 months of log files in CSV format.

tag clouds

Here is the root for the most frequent words on our logs:

If we select “IRC” only:

I ofen say people an IRC client is “comfier”.

“lol” seems to be my favorite expression:

I’m also someone obsessed by “programming languages”.


I imagine the tool computes a distance between words (like Levenshtein). We see the value in the “Proximity” column.

We can see the facets in action here for the query “Yeah, dpt is as fun as ever”. The tool understands people who disagree.

