Browse > Home / Blog / Recap: Future of Semantic Search Panel @ Web 3.0

| Subcribe via RSS

Recap: Future of Semantic Search Panel @ Web 3.0

January 31st, 2010 Posted in Blog

I had the good fortune on Thursday to be a part of a panel on semantic search at the Web 3.0 Conference. The panel was organized Mark Johnson(Bing/Powerset) and featured the likes of Connie Kenneally (TextWise), Will Hunsinger (Evri), Tim Musgrove (TextDigger), and yours truly (LCC, Swingly, Extractiv, etc.).

Mark put on an absolutely great panel. In addition to being one of the most knowledgeable people in our industry, he’s a natural-born moderator and a talented discussion leader. He’s got great journalistic chops too: definitely not one to shy away from asking the tough questions.

Since I wasn’t able to capture video of the panel, I thought I’d try to recreate my side of the discussion. Here are some of the questions that Mark asked — and the gist of the answers I gave. (Or would have given.)

More after the jump…

Mark Johnson: So, semantic search. A few years ago, this panel was made up of companies like Powerset and Hakia — companies with the stated goal of taking market share from Google/Yahoo/Bing. Now, it’s hard to find anyone who would even claim that they’re doing “search” anymore. Is search even the right word anymore? Would anyone consider what they’re doing to be search?

I think we’re seeing the diversification of semantic search. What did “semantic search” mean a few years ago? Beating Google/Yahoo!/Bing at their own game, using some as-of-yet untapped “semantic” technology. But heck, while we knew what the app looked like — pan-galatic web search — but we had no idea which semantic tech would actually make a difference. (Or what “semantic” meant, for that matter.)

Startups are now exploring how semantic search can be used to improve other kinds of apps, ones that are much more micro-scale than traditional search. That’s not to say that the current generation of semantic search startups have less ambition than the Powersets and Hakias of past years. We’re just as hungry — probably more so.

However, it does have a lot to do with the fact that traditional search (or retrieval) tech works just so darn well most of the time. If you’re interested in figuring out the name of the song that’s going through your head (as Google’s Johanna Wright was doing at Web 3.0), there’s nothing in particular about semantic search that’s going to help match the lyrics you know to a page with the rest of the song on it. And furthermore, while traditional search is by no means perfect, it’s generally at least mediocre all the time. Need to know how big labrador retrievers get? While a question-answering engine (like Swingly or WolframAlpha) might be able to interpret your question using completely snazzy semantic technology, it doesn’t matter how sophisticated their approach is if they don’t get you the right answer. In most cases, people will settle for mediocre and reliable over totally sexy but occasionally flaky.

So, it’s incumbent on us semantically-oriented startups to find the right set of use cases. (I hesitate to call them “markets” as of yet.) Ones where the sexiness is totally worth any potential flakiness. One where you can do things that you weren’t ever able to do before. Want to interact with lots and lots of structured data using natural language? There’s a semantic app for that: WolframAlpha. Need to find people who talk about the same things that you do on Twitter? There’s gonna be a semantic app for that.

Is semantic search dead? No, not in the least. However, we’ve realized that it’s time for us to show what we (in particular) do best — and that may not be pan-galatic gargleblasting search (in the way we know it now).

MJ: What scares you most?

Me, I’m most scared of Mechanical Turk. Yeah, that’s right: I’m afraid of people.

Here’s why. Any of us who invest in semantic technologies have a deep, unshakeable belief that we can build machines which can get meaning from text faster and better than any human ever could. And we’ve made a heckuva lot of progress these days: we’re beginning to talk about machines being able to “read” texts, take AP exams, translate a text in any language into any other language, etc. And that’s largely without the contributions of the Semantic Web community. Without using linked data. Without taking advantage of semantic interchange formats and standards, like RDF.

However, here’s the catch. Our algorithms aren’t perfect. In fact, they’re far from it. We still need humans to “train” our algorithms — that is, to give them cookies when they do well, and to hit them with a rolled up newspaper when they mess up the living room. And that costs money. And takes plenty of time for experimentation and analysis to get things right. And of course, that costs money, too.

Companies like mine continue to invest in R&D because we’re looking to minimize — or ultimately to get rid of — this kind of human input to our systems. R&D is expensive, sure — but it pales in comparison to the costs we’d have to incur if we had to go out any pay humans to perform the same task without any automation.

Turk is really disruptive because it makes it possible for humans to “fight back”. It’s cheap. It’s fast. It’s got the quality benefits that come from crowds checking (and re-checking) each other’s input. Does that mean that we’re going to see humans replace NLP systems? Well, no. But if it’s more cost-effective to let humans do an NLP task — like a name annotation task, say — that’s going to potentially jeopardize future investment in automation.

MJ: What tech do you use? Where does it come from? Do you use 3rd party software tools?

My two start-ups, Swingly and Extractiv, use technology that’s been developed by their parent company, Language Computer Corporation. We don’t use any 3rd party tools — largely due to licensing issues. Using GPL components can make it tricky if you ultimately want to license software yourself.

MJ: Who are your customers?

Swingly’s definitely designed for the web user. Our goal is to provide access to that 1% of knowledge that’s already out there — and really hard to get to through traditional search techniques. We’re also attracting some serious attention from folks with lots of domain-specific data: call centers, customer support centers, any service that has to maintain an FAQ, etc.

With Extractiv, we’re looking to become an “authoritative” provider of semantic content. Not just semantic annotations — mind you, although we definitely will do that, too. We want to establish that we are that definitive source for high-quality data (that no one else can get their hands on).

MJ: Is the popularity of “free-mium” causing companies to monopolize each others’ revenue streams?

To an extent. It’s also important to recognize that “free-mium” services (of which OpenCalais is probably the best example) have done a tremendous amount to set the market for semantic apps. It’s probably safe to say that without the success of OpenCalais’s more-or-less free service, we’d not being having as many mainstream discussions about the value of semantic apps.

I think free-mium models will begin to make a lot more sense in the not-so-distant future. Consumers’ appetites for content are only going to grow. And while we don’t see that many “power users” who need more capacity than they can get from a free service now, things are going to change. Whether we’re gonna see free-mium providers expand what they offer for free is the real question, however…

MJ: How do you measure how good you are? How do you communicate about measurements to your customers?

I’m a big fan of “open”, impartial, community-wide evaluations. I’ve participated in a bunch during my time at Language Computer: TREC (for question-answering), DUC (for summarization), ACE (for information extraction), and TAC (for textual inference). Yes, participating in these evals requires significant investment. But it’s tremendously satisfying to be able to point to a real benchmark, especially in space as competitive as ours. Frankly, I think there should be more opportunities for tech companies to show off what they can do.

We have to realize, however, that precision and recall aren’t enough. While benchmarks attract customers, they also can set up unrealistic expectations. If you don’t frame the discussion in terms of the real impact of your technology, it really doesn’t matter if your system can correctly answer 80% of questions users ask — they’ll only focus on the 20% where you left them high-and-dry.

MJ: Where do you see your business in 5 years?

I have two hopes for my companies. First, I’d like them to be contributors to the major search providers. There might be other viable ways forward, but aggregating search tools together into a single portal seems to be the way we’re all headed these days. Second, I’d count us as successful if we’re also actively shaping the discussion about how semantic apps should evolve. There are a lot of open questions out there. And I’d like us to have a crack at answering them.

3 Responses to “Recap: Future of Semantic Search Panel @ Web 3.0”

  1. uberVU - social comments Says:

    Social comments and analytics for this post…

    This post was mentioned on Twitter by andyhickl: New Blog Post: Recap of the “Future of Semantic Search Panel at Web 3.0″ http://bit.ly/recapWeb3 [Thanks to @philosophygeek!]…


  2. Most Tweeted Articles by Semantic Web Experts: MrTweet Says:

    Your article was most tweeted by Semantic Web experts in the Twitterverse…

    Come see other top popular articles surfaced by Semantic Web experts!…


  3. Tweets that mention AndyHickl.com » Blog Archive » Recap: Future of Semantic Search Panel @ Web 3.0 -- Topsy.com Says:

    [...] This post was mentioned on Twitter by Kirsten Cluthe, evri, kristathomas, OpenCalais, Mark Johnson and others. Mark Johnson said: RT @andyhickl: New Blog Post: Recap of the "Future of Semantic Search Panel at Web 3.0" http://bit.ly/recapWeb3 [...]


Leave a Reply