On the Go: 3 Things I Think

Three things that don’t yet require a full post… all after the jump.
1. I found people on the Internet who can still spell.
Like many of you, I’m a big fan of Aardvark, the social media company that’s using instant messaging to match people who have questions with people who might have answers.
What’s fascinating, however, is how articulated — and complex — the questions people ask are. And what’s better: the questions (and answers) are grammatical, correctly-spelled, and for the most part, make sense! Maybe this is an artifact of early adopters, but I’ll take it.
I am a linguist (and in some circles, could qualify as a grammar snob), but I’m much more likely to trust advice from people who can assemble well-formed sentences. Hear that, Yahoo! Answers?
2. When it comes to search, questions aren’t extinct — they’re just well-camouflaged.
Spent a lot of time this week analyzing about 100M queries from a traditional (read: keyword-based) search engine. Since I’ve been told “people don’t ask questions” when it comes to search engines 386 times in the past month, I was expecting the worst.
Sure, things that “look like” questions are rare: 2.6%. Here, I’m talking about queries that have a WH word along with a sequence of keywords in more or less sentence order. (A good example would be something like tom cruise marry nicole kidmon where.) About half of those are ‘how to’ questions, as well. (Incidentally, how-to questions are a market that’s just not being served any search or question-answering company today.)
But that’s not the whole story. WH-words (who, what, etc.) show up as question words (not relativizers) in about 6% of queries. While many of these queries are reduced (think tom cruise what car), and it might not be clear exactly what single, specific kind of information they’re seeking, one thing’s for certain: people submitting these queries are looking for a particular bit of knowledge, and not the kind of background info that can be gleaned from search results.
Taken on its own, this might not be particularly good news for Q&A providers (what, 6% of queries isn’t good enough for you?) trying to eke out market share in a competitive marketplace.
But if we consider that certain keyword queries could be considered to be “noisy” translations of well-formed questions, things perl up immediately. It’s not hard to imagine a question like what was the name of the former agent who is now tom cruise's partner at united artists? being transmogrified into a keyword query like name agent tom cruise partner united artist or even cruise partner ua.
I’m now doing the analysis (by hand) to see how many queries of that sort are out there. Initial estimates are promising: 28% of the first 50K queries could be construed as a “noisy Q”.
3. Ontological types that I discovered that I needed access to this week:
Just some of the latest custom entity extractors that I’ve had to create this week for some stuff I’m doing for Swingly.
- star wars fighters
- hollywood agent
- over-the-counter drugs
- word processing programs
- web service protocols
- female ufc fighters
- austin suburbs
- artisan chocolatiers
- quiche ingredients
- characters from star wars books that aren’t in the movies
August 17th, 2009 at 6:00 pm
Andy, glad to hear that you’re enjoying the Aardvark community! We have a great user base and we’re doing our best to make sure it stays that way as we grow. I’d love to hear any other feedback you or your readers have about Aardvark.
- Alison @ Aardvark
alison@aardvarkteam.com