The tool works a bit like Lucene. It uses contextual word similarities to extract facets (group of words in the same field). They are called clusters in semdee. semdee is language agnostic and can take any format as input.
The user can navigate in the data by clicking words in tag clouds.
Here is the root for the most frequent words on our logs:
If we select “IRC” only:
I ofen say people an IRC client is “comfier”.
“lol” seems to be my favorite expression:
I’m also someone obsessed by “programming languages”.
I imagine the tool computes a distance between words (like Levenshtein). We see the value in the “Proximity” column.
We can see the facets in action here for the query “Yeah, dpt is as fun as ever”. The tool understands people who disagree.
If you find that inspiring, you can join us on the chat.