<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AndyHickl.com &#187; semantic search</title>
	<atom:link href="http://andyhickl.com/tag/semantic-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyhickl.com</link>
	<description>building the next big thing down in big d</description>
	<lastBuildDate>Tue, 09 Mar 2010 20:36:04 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Quick Comment: 5 Sites Better than Google</title>
		<link>http://andyhickl.com/2010/02/13/quick-comment-5-sites-better-than-google/</link>
		<comments>http://andyhickl.com/2010/02/13/quick-comment-5-sites-better-than-google/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 18:49:42 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[q&a]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[siri]]></category>
		<category><![CDATA[Swingly]]></category>
		<category><![CDATA[wolframalpha]]></category>

		<guid isPermaLink="false">http://andyhickl.com/2010/02/13/quick-comment-5-sites-better-than-google/</guid>
		<description><![CDATA[Just a quick comment on Harry McCracken's provocatively-titled post "Five Sites That are Better than Google". His picks? Bing, Aardvark (now Google Aardvark), Wolfram&#124;Alpha, Twitter, and Siri.]]></description>
			<content:encoded><![CDATA[<p>Just a quick comment on <strong><a href="http://technologizer.com/about/">Harry McCracken</a></strong>&#8217;s provocatively-titled post &#8220;<strong><a href="http://www.foxnews.com/scitech/2010/02/09/sites-better-google/">Five Sites That are Better than Google</a></strong>&#8220;. His picks? <a href="http://bing.com"><strong>Bing</strong></a>, <a href="http://vark.com"><strong>Aardvark</strong> </a>(now <a href="http://www.googlelabs.com/"><strong>Google Aardvark</strong></a>), <a href="http://wolframalpha.com"><strong>Wolfram|Alpha</strong></a>, <a href="http://twitter.com"><strong>Twitter</strong></a>, and <a href="http://siri.com"><strong>Siri</strong></a>.</p>
<p>For the moment, let&#8217;s leave aside the question of what makes a site &#8220;better&#8221; than <a href="http://google.com"><strong>Google</strong></a> (other a conceit to attract clicks like mine). And while we&#8217;re at it, the elevation of Bing or Twitter (or heck, any of these five sites) to the rarified stratum that Google occupies. (It is <a href="http://foxnews.com"><strong>Fox News</strong></a>, after all.)</p>
<p>What got me blogging was the fact that he singled-out three Q&amp;A services: <strong>Aardvark</strong>, <strong>Wolfram</strong>, and <strong>Siri</strong>. Tell us, Harry, what makes these sites better than Google?</p>
<blockquote style="margin-right: 0px;" dir="ltr"><p><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;"><strong><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;">[<strong>Aardvark]</strong> works well when you&#8217;d rather get quick advice from a few real knowledgeable people than scour Google results for relevant links on a question such as &#8220;Should I buy a mountain bike, a road bike, or a hybrid to ride around San Francisco?&#8221;</span></span></strong></span></span></p>
<p><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;"><strong>Wolfram|Alpha</strong> calls <a style="position: relative; padding-bottom: 0.1em; background-color: transparent; margin: 0px; padding-left: 0px; outline-width: 0px; padding-right: 0px; color: #183a52; font-size: 12px; vertical-align: baseline; text-decoration: none; padding-top: 0.3em; background-origin: initial; background-clip: initial; border-width: 0px;" href="http://www.wolfframalpha.com/">itself</a><span style="background-color: transparent; margin: 0px; outline-width: 0px; font-size: 12px; vertical-align: baseline; background-origin: initial; background-clip: initial; border-width: 0px; padding: 0px;"> a &#8220;computational knowledge engine,&#8221; but I think of it as a 21st-century equivalent of a thick, fact-packed paperback almanac. It&#8217;s a vast repository of knowledge skewing towards the mathematical and scientific that you can explore by entering questions.</span></span></span></p>
<p><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;"><span style="background-color: transparent; margin: 0px; outline-width: 0px; font-size: 12px; vertical-align: baseline; background-origin: initial; background-clip: initial; border-width: 0px; padding: 0px;"><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;">[<strong>Siri</strong>]&#8217;s a &#8220;virtual personal assistant&#8221; that uses voice recognition, your GPS location, and links to local information and services to respond to requests you speak into an iPhone 3GS.</span></span></span></span></span></p></blockquote>
<p>The answer? Their ability to deliver precise bits of information &#8212; without having to &#8220;scour&#8221; Google results. That&#8217;s also where he sees value in Bing:</p>
<blockquote style="margin-right: 0px;" dir="ltr"><p><span class="Apple-style-span" style="widows: 2; text-transform: none; text-indent: 0px; border-collapse: separate; font: 10px helvetica, 'microsoft sans serif', arial, sans-serif; white-space: normal; orphans: 2; letter-spacing: normal; color: #000000; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="line-height: 16px; font-family: arial, sans-serif; color: #111111; font-size: 12px;">[With <strong>Bing</strong> Travel], You can enter dates and locations for plane tickets or hotel stays, then get a grid of results that you can further refine &#8212; to direct flights only, for instance, or to hotels with swimming pools. </span></span></p></blockquote>
<p>Despite the attention-grabby headline, I love articles like these because it suggests conscious is changing regarding Web search. While traditional search engines aren&#8217;t going away anytime soon, consumers are beginning to see value in services which can get you to the content you&#8217;re really looking for &#8212; or, at the very least, can hook you up with experts who can help you get on the right track.</p>
<p>What&#8217;s particuarly interesting about Aardvark, Wolfram, and Siri is that they&#8217;re all providing access to information that has been created &#8212; and vetted &#8212; by humans. Vark is brilliant because it lets you ask questions to your social network <em>en masse</em> &#8212; without having to wait for anyone friend to pick up and tell you &#8220;I have no clue.&#8221; Siri makes sure you don&#8217;t have to search <strong>Yelp</strong> or <strong>Gayot</strong> or yes, Google, to be able to find if <a href="http://boomnoodle.com"><strong>Boom Noodle</strong></a> closes at 8 or 10 pm on Sundays. Wolfram saves you the trouble of trying to cram the latest <strong>Information Please</strong> almanac in your pocket.</p>
<p>Yes, all of this information is great. And yes, I use all of three of these services every day. But truth be told, I find all of them a little unsatisfying: I&#8217;ve got so many more questions that these services can&#8217;t answer. Here are just a few that I thought of this morning while watching the Winter Olympics:</p>
<ul>
<li>What&#8217;s Sidney Crosby&#8217;s number?</li>
<li>What country did the Winter Olympic sport of skeleton originate in?</li>
<li>Who is the fastest female luger? (And where can I meet her?)</li>
</ul>
<p>Like I said, I&#8217;m an enthusiastic user of Q&amp;A services. (And yes, I&#8217;m building <a href="http://www.swingly.com/">one of my own</a>.) But I want these services to be transformative: to do things that I could never do myself (even if I had the time and energy). Don&#8217;t just improve access to data sources that I could likely manipulate myself. Give me access to knowledge that I could never have.</p>
<p>I know, I&#8217;m impatient. And lazy: I could probably spend time pouring over search results to find my answer. Or go to Wikipedia. But I want what Harry wants: content at my fingertips.  I may be more greedy than Harry, though:  I want all of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyhickl.com/2010/02/13/quick-comment-5-sites-better-than-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recap: Future of Semantic Search Panel @ Web 3.0</title>
		<link>http://andyhickl.com/2010/01/31/recap-future-of-semantic-search-panel-web-3-0/</link>
		<comments>http://andyhickl.com/2010/01/31/recap-future-of-semantic-search-panel-web-3-0/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 21:51:48 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[panel]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Swingly]]></category>
		<category><![CDATA[web 3.0]]></category>

		<guid isPermaLink="false">http://andyhickl.com/2010/01/31/recap-future-of-semantic-search-panel-web-3-0/</guid>
		<description><![CDATA[
I had the good fortune on Thursday to be a part of a panel on semantic search at the Web 3.0 Conference. The panel was organized Mark Johnson(Bing/Powerset) and featured the likes of Connie Kenneally (TextWise), Will Hunsinger (Evri), Tim Musgrove (TextDigger), and yours truly (LCC, Swingly, Extractiv, etc.).
Mark put on an absolutely great panel. [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://andyhickl.com/wp-content/uploads/2010/01/markJpic.jpg"><img class="size-full wp-image-288  aligncenter" title="markJpic" src="http://andyhickl.com/wp-content/uploads/2010/01/markJpic.jpg" alt="" width="336" height="280" /></a></p>
<p>I had the good fortune on Thursday to be a part of a panel on semantic search at the <strong><a href="http://www.mediabistro.com/web3/">Web 3.0 Conference</a></strong>. The panel was organized <strong>Mark Johnson</strong>(<a href="http://www.bing.com/">Bing/Powerset</a>) and featured the likes of <strong>Connie Kenneally</strong> (<a href="http://www.textwise.com/">TextWise</a>), <strong>Will Hunsinger</strong> (<a href="http://www.evri.com/">Evri</a>), <strong>Tim Musgrove</strong> (<a href="http://www.textdigger.com/">TextDigger</a>), and yours truly (<a href="http://www.languagecomputer.com/">LCC</a>, <a href="http://www.swingly.com/">Swingly</a>, <a href="http://www.extractiv.com/">Extractiv</a>, etc.).</p>
<p>Mark put on an absolutely great panel. In addition to being one of the most knowledgeable people in our industry, he&#8217;s a natural-born moderator and a talented discussion leader. He&#8217;s got great journalistic chops too: definitely not one to shy away from asking the tough questions.</p>
<p>Since I wasn&#8217;t able to capture video of the panel, I thought I&#8217;d try to recreate my side of the discussion. Here are some of the questions that Mark asked &#8212; and the gist of the answers I gave. (Or would have given.)</p>
<p><em>More after the jump&#8230;</em></p>
<p><span id="more-287"></span></p>
<p><strong>Mark Johnson: So, semantic search. A few years ago, this panel was made up of companies like Powerset and Hakia &#8212; companies with the stated goal of taking market share from Google/Yahoo/Bing. Now, it&#8217;s hard to find anyone who would even claim that they&#8217;re doing &#8220;search&#8221; anymore.</strong> <strong>Is search even the right word anymore? Would anyone consider what they&#8217;re doing to be search?</strong></p>
<p>I think we&#8217;re seeing the diversification of semantic search. What did &#8220;semantic search&#8221; mean a few years ago? Beating Google/Yahoo!/Bing at their own game, using some as-of-yet untapped &#8220;semantic&#8221; technology. But heck, while we knew what the app looked like &#8212; pan-galatic web search &#8212; but we had no idea which semantic tech would actually make a difference. (Or what &#8220;semantic&#8221; meant, for that matter.)</p>
<p>Startups are now exploring how semantic search can be used to improve other kinds of apps, ones that are much more micro-scale than traditional search. That&#8217;s not to say that the current generation of semantic search startups have less ambition than the <a href="http://www.powerset.com/">Powersets</a> and <a href="http://www.hakia.com/">Hakias</a> of past years. We&#8217;re just as hungry &#8212; probably more so.</p>
<p>However, it does have a lot to do with the fact that traditional search (or retrieval) tech works just so darn well most of the time. If you&#8217;re interested in figuring out the name of the song that&#8217;s going through your head (as Google&#8217;s <strong>Johanna Wright</strong> was doing at <a href="http://www.mediabistro.com/web3">Web 3.0</a>), there&#8217;s nothing in particular about semantic search that&#8217;s going to help match the lyrics you know to a page with the rest of the song on it. And furthermore, while traditional search is by no means perfect, it&#8217;s generally at least mediocre all the time. Need to know how big labrador retrievers get? While a question-answering engine (like <strong>Swingly</strong> or <strong><a href="http://www.wolframalpha.com/">WolframAlpha</a></strong>) might be able to interpret your question using completely snazzy semantic technology, it doesn&#8217;t matter how sophisticated their approach is if they don&#8217;t get you the right answer. In most cases, people will settle for mediocre and reliable over totally sexy but occasionally flaky.</p>
<p>So, it&#8217;s incumbent on us semantically-oriented startups to find the right set of use cases. (I hesitate to call them &#8220;markets&#8221; as of yet.) Ones where the sexiness is totally worth any potential flakiness. One where you can do things that you weren&#8217;t ever able to do before. Want to interact with lots and lots of structured data using natural language? There&#8217;s a semantic app for that: <a href="http://wolframalpha.com/">WolframAlpha</a>. Need to find people who talk about the same things that you do on Twitter? There&#8217;s gonna be a semantic app for that.</p>
<p>Is semantic search dead? No, not in the least. However, we&#8217;ve realized that it&#8217;s time for us to show what we (in particular) do best &#8212; and that may not be pan-galatic gargleblasting search (in the way we know it now).</p>
<p><strong>MJ: What scares you most?</strong></p>
<p>Me, I&#8217;m most scared of <a href="https://www.mturk.com/mturk/welcome">Mechanical Turk</a>. Yeah, that&#8217;s right: I&#8217;m afraid of people.</p>
<p>Here&#8217;s why. Any of us who invest in semantic technologies have a deep, unshakeable belief that we can build machines which can get meaning from text faster and better than any human ever could. And we&#8217;ve made a heckuva lot of progress these days: we&#8217;re beginning to talk about machines being able to &#8220;read&#8221; texts, take AP exams, translate a text in any language into any other language, etc. And that&#8217;s largely <em>without</em> the contributions of the Semantic Web community. Without using linked data. Without taking advantage of semantic interchange formats and standards, like RDF.</p>
<p>However, here&#8217;s the catch. Our algorithms aren&#8217;t perfect. In fact, they&#8217;re far from it. We still need humans to &#8220;train&#8221; our algorithms &#8212; that is, to give them cookies when they do well, and to hit them with a rolled up newspaper when they mess up the living room. And that costs money. And takes plenty of time for experimentation and analysis to get things right. And of course, that costs money, too.</p>
<p>Companies like mine continue to invest in R&amp;D because we&#8217;re looking to minimize &#8212; or ultimately to get rid of &#8212; this kind of human input to our systems. R&amp;D is expensive, sure &#8212; but it pales in comparison to the costs we&#8217;d have to incur if we had to go out any pay humans to perform the same task without any automation.</p>
<p>Turk is really disruptive because it makes it possible for humans to &#8220;fight back&#8221;. It&#8217;s cheap. It&#8217;s fast. It&#8217;s got the quality benefits that come from crowds checking (and re-checking) each other&#8217;s input. Does that mean that we&#8217;re going to see humans replace NLP systems? Well, no. But if it&#8217;s more cost-effective to let humans do an NLP task &#8212; like a name annotation task, say &#8212; that&#8217;s going to potentially jeopardize future investment in automation.</p>
<p><strong>MJ: What tech do you use? Where does it come from? Do you use 3rd party software tools?</strong></p>
<p>My two start-ups, <strong>Swingly</strong> and <strong>Extractiv</strong>, use technology that&#8217;s been developed by their parent company, <strong>Language Computer Corporation</strong>. We don&#8217;t use any 3rd party tools &#8212; largely due to licensing issues. Using GPL components can make it tricky if you ultimately want to license software yourself.</p>
<p><strong>MJ: Who are your customers?</strong></p>
<p>Swingly&#8217;s definitely designed for the web user. Our goal is to provide access to that 1% of knowledge that&#8217;s already out there &#8212; and really hard to get to through traditional search techniques. We&#8217;re also attracting some serious attention from folks with lots of domain-specific data: call centers, customer support centers, any service that has to maintain an FAQ, etc.</p>
<p>With Extractiv, we&#8217;re looking to become an &#8220;authoritative&#8221; provider of semantic content. Not just semantic annotations &#8212; mind you, although we definitely will do that, too. We want to establish that we are that definitive source for high-quality data (that no one else can get their hands on).</p>
<p><strong>MJ: Is the popularity of &#8220;free-mium&#8221; causing companies to monopolize each others&#8217; revenue streams?</strong></p>
<p>To an extent. It&#8217;s also important to recognize that &#8220;free-mium&#8221; services (of which <strong>OpenCalais</strong> is probably the best example) have done a tremendous amount to set the market for semantic apps. It&#8217;s probably safe to say that without the success of OpenCalais&#8217;s more-or-less free service, we&#8217;d not being having as many mainstream discussions about the value of semantic apps.</p>
<p>I think free-mium models will begin to make a lot more sense in the not-so-distant future. Consumers&#8217; appetites for content are only going to grow. And while we don&#8217;t see that many &#8220;power users&#8221; who need more capacity than they can get from a free service now, things are going to change. Whether we&#8217;re gonna see free-mium providers expand what they offer for free is the real question, however&#8230;</p>
<p><strong>MJ: How do you measure how good you are? How do you communicate about measurements to your customers?</strong></p>
<p>I&#8217;m a big fan of &#8220;open&#8221;, impartial, community-wide evaluations. I&#8217;ve participated in a bunch during my time at <strong>Language Computer</strong>: <a href="http://www.trec.nist.gov/">TREC</a> (for question-answering), <a href="http://duc.nist.gov/">DUC</a> (for summarization), <a href="http://www.itl.nist.gov/iad/mig/tests/ace/">ACE</a> (for information extraction), and <a href="http://tac.nist.gov/">TAC</a> (for textual inference). Yes, participating in these evals requires significant investment. But it&#8217;s tremendously satisfying to be able to point to a real benchmark, especially in space as competitive as ours. Frankly, I think there should be more opportunities for tech companies to show off what they can do.</p>
<p>We have to realize, however, that precision and recall aren&#8217;t enough. While benchmarks attract customers, they also can set up unrealistic expectations. If you don&#8217;t frame the discussion in terms of the real impact of your technology, it really doesn&#8217;t matter if your system can correctly answer 80% of questions users ask &#8212; they&#8217;ll only focus on the 20% where you left them high-and-dry.</p>
<p><strong>MJ: Where do you see your business in 5 years?</strong></p>
<p>I have two hopes for my companies. First, I&#8217;d like them to be contributors to the major search providers. There might be other viable ways forward, but aggregating search tools together into a single portal seems to be the way we&#8217;re all headed these days. Second, I&#8217;d count us as successful if we&#8217;re also actively shaping the discussion about how semantic apps should evolve. There are a lot of open questions out there. And I&#8217;d like us to have a crack at answering them.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyhickl.com/2010/01/31/recap-future-of-semantic-search-panel-web-3-0/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Miss:  &#8220;The Evolution of Semantic Search&#8221; @ Web 3.0</title>
		<link>http://andyhickl.com/2010/01/27/dont-miss-the-evolution-of-semantic-search-web-3-0/</link>
		<comments>http://andyhickl.com/2010/01/27/dont-miss-the-evolution-of-semantic-search-web-3-0/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 15:34:25 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Swingly]]></category>
		<category><![CDATA[web30]]></category>

		<guid isPermaLink="false">http://andyhickl.com/2010/01/27/dont-miss-the-evolution-of-semantic-search-web-3-0/</guid>
		<description><![CDATA[I'm part of an excellent panel (organized by Mark Johnson of Powerset/Bing fame) this morning at the Web 3.0 Conference in Santa Clara. ]]></description>
			<content:encoded><![CDATA[<div class="posterous_autopost">
<div class="posterous_bookmarklet_entry">
<blockquote class="posterous_long_quote"><p><span class="session-title"><strong>The Evolution of Semantic Search</strong></span>The potential for semantic search to take on the role of an all-purpose engine is dead. Building a search engine is just too expensive: a massive capital expenditure, a huge team, and a marketing campaign to hook users are beyond the reach for most companies, let alone a startup. And, the big players are already integrating more and more semantic technology, such as Microsoft’s acquisition of Bing and Yahoo’s SearchMonkey initiative. That being said, there are still many ways for semantic technology to provide value to smaller domains in search. It’s time we refined our notion of semantic search and discuss what’s next for semantic search startups.<span class="session-title"><strong><br />
</strong></span></p>
<table border="0">
<tbody>
<tr>
<td><a href="http://www.mediabistro.com/web3/speakers.asp#andyhickl"><img src="http://www.mediabistro.com/web3/images/speaker_andyhickl_100x100.jpg" border="0" alt="Andy Hickl" width="100" height="100" /></a><a href="http://www.mediabistro.com/web3/speakers.asp#markjohnson"></a></td>
<td><a href="http://www.mediabistro.com/web3/speakers.asp#willhunsiger"><img src="http://www.mediabistro.com/web3/images/speaker_willhunsiger_100x100.jpg" border="0" alt="Will Hunsiger" width="100" height="100" /></a><a href="http://www.mediabistro.com/web3/speakers.asp#markjohnson"></a></td>
<td><a href="http://www.mediabistro.com/web3/speakers.asp#markjohnson"><img src="http://www.mediabistro.com/web3/images/speaker_markjohnson_100x100.jpg" border="0" alt="Mark Johnson" width="100" height="100" /></a></td>
<td><a href="http://www.mediabistro.com/web3/speakers.asp#conniekenneally"><img src="http://www.mediabistro.com/web3/images/speaker_conniekenneally_100x100.jpg" border="0" alt="Connie Kenneally" width="100" height="100" /></a></td>
</tr>
<tr>
<td valign="top"><span class="smalltext"><strong><a href="http://www.mediabistro.com/web3/speakers.asp#willhunsiger">ANDY HICKL</a></strong><br />
CEO<br />
<a href="http://www.swingly.com/">Swingly</a></span></td>
<td valign="top"><span class="smalltext"><strong><a href="http://www.mediabistro.com/web3/speakers.asp#willhunsiger">WILL HUNSINGER</a></strong><br />
CEO<br />
<a href="http://www.evri.com/">Evri</a></span></td>
<td valign="top"><span class="smalltext"><strong>Moderator<a href="http://www.mediabistro.com/web3/speakers.asp#markjohnson"><br />
MARK JOHNSON</a></strong><br />
Senior Program Manager<br />
<a href="http://www.bing.com/">Bing at Microsoft</a> </span></td>
<td valign="top"><span class="smalltext"><strong><a href="http://www.mediabistro.com/web3/speakers.asp#conniekenneally">CONNIE KENNEALLY</a></strong><br />
CEO<br />
<a href="http://www.textwise.com/">Textwise</a></span></td>
</tr>
</tbody>
</table>
</blockquote>
</div>
<div class="posterous_bookmarklet_entry"></div>
<div class="posterous_bookmarklet_entry">I&#8217;m part of an <a href="http://www.mediabistro.com/web3/program.asp">excellent panel</a> (organized by <a href="http://twitter.com/philosophygeek"><strong>Mark Johnson</strong></a> of <a href="http://www.bing.com"><strong>Powerset/Bing</strong></a> fame) this morning at the <a href="http://www.mediabistro.com/web3/"><strong>Web 3.0 Conference</strong></a> in Santa Clara.</div>
<div class="posterous_bookmarklet_entry">
<p>We&#8217;re slated to tackle the question of &#8220;what&#8217;s next&#8221; for semantic search &#8212; a worth topic, indeed!</p>
<p>But, I have the feeling that we&#8217;ll all be circling back to the more vexing problem of exactly how companies who have invested in semantic technologies can create real (sustainable, sexy, growing) markets for their products.</p>
<p>There&#8217;s no live feed, but I&#8217;ll get shakycam video up later this afternoon.</p>
</div>
<p style="font-size: 10px;"><a href="http://posterous.com">Posted via web</a> from <a href="http://andyhickl.posterous.com/dont-miss-the-evolution-of-semantic-search-we">andyhickl&#8217;s posterous</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://andyhickl.com/2010/01/27/dont-miss-the-evolution-of-semantic-search-web-3-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tracking Sentiment from Day 1 of Defrag</title>
		<link>http://andyhickl.com/2009/11/12/tracking-sentiment-from-day-1-of-defrag/</link>
		<comments>http://andyhickl.com/2009/11/12/tracking-sentiment-from-day-1-of-defrag/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 16:37:16 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Defrag09]]></category>
		<category><![CDATA[defragcon]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Sentiment]]></category>

		<guid isPermaLink="false">http://andyhickl.com/?p=191</guid>
		<description><![CDATA[Language Computer has been tracking what Twitter users are saying about the first day of Defrag 2009 using Positively, our sentiment extraction tool.  ]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.languagecomputer.com"><strong>Language Computer</strong></a> has  been tracking what Twitter users are saying about the first day of <strong>Defrag 2009 </strong>using <strong>Positively</strong>, our sentiment extraction tool.</p>
<p><strong>The Good</strong></p>
<p>Here are the top 8 topics people are <strong>positive </strong>about (ranked in order of how strongly people are feeling).</p>
<p><span style="color: #008000;"><strong>1. People: </strong></span>So happy,  favorite,  fun,  great,  incredibly interesting,  interesting,  really nice,  smart,  super smart,  way more interesting</p>
<p><strong><span style="color: #008000;">2. Day:</span> </strong>good,  great,  lovely,  really fun</p>
<p><span style="color: #008000;"><strong>3. Dinner:</strong></span> Awesome,  Excellent,  absolutely delightful,  lovely</p>
<p><span style="color: #008000;"><strong>4. Topics: </strong></span>good,  positive</p>
<p><span style="color: #008000;"><strong>5. Discussions:</strong></span> Liking,  interesting,  lively</p>
<p><strong><span style="color: #008000;">6. Talks:</span></strong> interesting,  really interesting,  visually engaging</p>
<p><span style="color: #008000;"><strong>7. Wifi:</strong></span> as good as the weather,  enjoying,  hella sweet</p>
<p><strong><span style="color: #008000;">8.  EventVue, livestream, backchannel: </span></strong> Really enjoying, nice</p>
<p>(Ranking was determined on the inherent expected <strong>strength </strong>of the sentiment (e.g. <em>hella sweet </em>&gt;&gt;<em> okay</em>) and the <strong>number </strong>of tweets we found which expressed the same sentiment.  We&#8217;ve only listed the unique sentiments up above.)</p>
<p><strong>The Bad</strong></p>
<p>Far fewer <strong>negative </strong>sentiments.  In most cases, we couldn&#8217;t find something that three people were griping about.  Here are the two that reached that threshold:<strong></strong></p>
<p><strong><span style="color: #ff0000;">1.  Kessler&#8217;s talk</span></strong>:  dangerous, doesn&#8217;t care, stupid, bad, provocative, pointless, useless</p>
<p><strong><span style="color: #ff0000;">2.  language on stage: </span></strong>offended, unnecessary, pointless</p>
<p><strong>The Day 1 &#8220;Winners&#8221;</strong></p>
<p>After Day1, here are the most positively regarded Twitterers using the #defrag or #defragcon hashtags:  <a href="http://www.twitter.com/bpm140">@bpm140</a>, <a href="http://www.twitter.com/stoweboyd">@stoweboyd</a>, <a href="http://www.twitter.com/sacca">@sacca</a>, <a href="http://www.twitter.com/benkepes">@benkepes</a></p>
<p><strong>The Overall Score</strong></p>
<p>So far, tweets are running <strong>88% </strong>positive overall for <strong>Defrag 2009</strong> after day 1.</p>
<p style="text-align: center;"><a href="http://andyhickl.com/wp-content/uploads/2009/11/defragSenti.png"><img class="aligncenter size-full wp-image-193" title="defragSenti" src="http://andyhickl.com/wp-content/uploads/2009/11/defragSenti.png" alt="defragSenti" width="361" height="66" /></a></p>
<p>Want more info on sentiment tracking on Twitter using <strong>Positively</strong>?  Contact me at <a href="mailto:andy@languagecomputer.com">andy@languagecomputer.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyhickl.com/2009/11/12/tracking-sentiment-from-day-1-of-defrag/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Avoiding Search Overload</title>
		<link>http://andyhickl.com/2009/08/05/avoiding-search-overload/</link>
		<comments>http://andyhickl.com/2009/08/05/avoiding-search-overload/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 16:31:03 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[relevance]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Swingly]]></category>

		<guid isPermaLink="false">http://andyhickl.com/?p=40</guid>
		<description><![CDATA[Like you, we've heard a lot this summer about the challenges facing America:  the financial crisis, healthcare reform, and worst of all:  search overload.

Well, here at Swingly HQ, we've been doing our part.  We've been trying to find new ways to figure out what kinds of information are most relevant to a particular search topic.

While relevance modeling isn't exactly new, it's becoming an increasingly important problem for semantic search applications.   Information Extraction apps are rapidly increasing the amount of factual information that's available from the Internet.  That's good.  Unfortunately, instead of being buried under mountains of irrelevant information, we're now being overwhelmed with gigabytes of factual information which may (or may not) be exactly what we're looking for.  That's bad.

So, what's a new semantic search app to do?  ]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><img class="size-medium wp-image-48 aligncenter" title="3642650246_707852816a" src="http://andyhickl.com/wp-content/uploads/2009/08/3642650246_707852816a1-300x200.jpg" alt="3642650246_707852816a" width="300" height="200" /></p>
<p style="text-align: left;">
<p style="text-align: left;">Like you, we&#8217;ve heard a lot this summer about the challenges facing America:  the financial crisis, healthcare reform, and worst of all:  <strong>search overload</strong>.</p>
<p style="text-align: left;">Well, here at <a href="http://www.swingly.com">Swingly</a> HQ, we&#8217;ve been doing our part.  We&#8217;ve been trying to find new ways to figure out what kinds of information are most relevant to a particular search topic.</p>
<p style="text-align: left;">While relevance modeling isn&#8217;t exactly new, it&#8217;s becoming an increasingly important problem for semantic search applications.   Information Extraction apps are rapidly increasing the amount of factual information that&#8217;s available from the Internet.  That&#8217;s good.  Unfortunately, instead of being buried under mountains of irrelevant information, we&#8217;re now being overwhelmed with gigabytes of factual information which may (or may not) be exactly what we&#8217;re looking for.  That&#8217;s bad.</p>
<p style="text-align: left;">So, what&#8217;s a new  semantic search app to do?  Full details after the jump.</p>
<p style="text-align: left;"><span id="more-40"></span></p>
<p style="text-align: left;">Let&#8217;s imagine you&#8217;re interested in learning more about <strong>Jesse &#8220;The Body&#8221; Ventura</strong>.  Well, if you&#8217;ve got access to a named entity recognizer (like the one we use with Swingly), you might be able to infer that he&#8217;s a:</p>
<ul style="text-align: left;">
<li>person<em> </em></li>
<li>Navy SEAL</li>
<li>professional wrestler</li>
<li>actor</li>
<li>politician</li>
<li>mayor</li>
<li>governor</li>
<li>talk show host</li>
</ul>
<p style="text-align: left;">While these classes might not tell you anything you don&#8217;t already know, they (in theory) can provide semantic search apps with some of the intelligence needed to provide better, more informative results.</p>
<p style="text-align: left;">Since we know he&#8217;s a talk show host (among other things), we could have our app focus on finding information that&#8217;s relevant to any talk show host, such as:</p>
<ul style="text-align: left;">
<li>the station he&#8217;s on</li>
<li>the format / length / medium of his show</li>
<li>when his show started / ended</li>
<li>how big his audience is</li>
<li>&#8230;</li>
</ul>
<p style="text-align: left;">But since he&#8217;s a <em>former politician</em>-turned-talk show host, we might want to go further and find other talk show host-related facts that may only be relevant for talk show hosts with this kind of background, such as:</p>
<ul style="text-align: left;">
<li>his political affiliation</li>
<li>his endorsements</li>
<li>the blogs that cover him</li>
<li>&#8230;</li>
</ul>
<p style="text-align: left;">These kinds of facts are most likely not relevant for other kinds of talk show hosts:  e.g. about your local sports-talk radio jock, the CarTalk guys, etc.</p>
<p style="text-align: left;">But we&#8217;re not out of the woods just yet.</p>
<p style="text-align: left;">While information extraction apps now able to capture lots and lots of different types of facts (including some of the ones I&#8217;ve listed above), they still require a human to tell them which classes of facts are relevant (e.g. <em>political affiliation</em> for a former politician-turned-talk show host) &#8212; and which ones aren&#8217;t (like <em>home runs hit</em> for a hockey player).</p>
<p style="text-align: left;">What&#8217;s worse?  Even though extractor ontologies are growing, most of the facts that we&#8217;ll need coverage for won&#8217;t be covered by an extractor.  Despite lots of efforts to reduce the cost of creating (and maintaining) extractors, ensuring adequate coverage across multiple domains requires serious investment in time and money.</p>
<p style="text-align: left;">So, what&#8217;s a semantic search app to do?  Start small &#8212; and work your way up.</p>
<p style="text-align: left;">At LCC, <a href="http://www.jplehmann.com" target="_blank">John Lehmann</a> (and his team) recently developed a new algorithm which determines the most relevant predicates for each of individual names &#8212; or classes of names &#8212; mentioned in Swingly&#8217;s index.</p>
<p style="text-align: left;">For example, if you&#8217;re interested in information on NASCAR great <a href="http://en.wikipedia.org/wiki/Richard_Petty" target="_blank">Richard Petty</a>, we think you might want be most interested in sentences which contain any of the following predicates.  (I&#8217;ve <strong>bolded </strong>some of the ones that I think are particularly interesting.)</p>
<blockquote style="text-align: left;"><p>{ <strong>win</strong>=46, appear=13, take=11, <strong>lead</strong>=10, <strong>drive</strong>=6, hold=6, <strong>race</strong>=6, <strong>finish</strong>=6, make=6, step=5, <strong>qualify</strong>=4, begin=4, <strong>compete</strong>=3, announce=3, mark=3, <strong>managed to qualify</strong>=3, fill=3, tie=2, visit=2, own=2, <strong>run</strong>=2, <strong>leave</strong>=2, <strong>return</strong>=2, come=2, suffer=2, remain=2, be part of=2, provide=2, drop=2, crash=2, follow=2, participate=2, spend=2, wear=2, miss=2, is currently=1, was back=1, serve=1, host=1, duplicate=1, form=1, log=1, tangle=1, record=1, <strong>start</strong>=1, discuss=1, running eleventh=1, put=1, give=1, use=1, bump=1, express=1, recognize=1, donate=1, unveil=1, stay=1, edge=1, slam=1, produce=1, send=1, remark=1, pull=1, become=1, snap=1, overcome=1, rebound=1, rivaled only=1, get=1, enter=1, raced alongside=1, collaborate=1, sign=1, failed to qualify=1, wave=1, match=1, release=1, was formerly=1, collect=1, trying to pass=1, allow=1, voice=1, set=1, retire=1, achieve=1, circle=1, established to honor=1, supply=1, develop=1, dominate=1, chose to run=1, begrudge=1, feel=1, tried to pass=1, outlast=1, sliding sideways=1, pit=1, claim=1, &#8230;}</p></blockquote>
<p style="text-align: left;">Or, if you&#8217;re interested in <strong>wizards</strong> (yes, LCC provides access to a named entity type #<strong>wizard</strong>), you might be preferentially interested in answers which talk about:</p>
<blockquote style="text-align: left;"><p>{ appear=29, use=24, tell=22, take=21, have=18, find=17, give=15, make=14, become=13, ask=11, create=10, leave=8, visit=7, turn=7, reveal=7, send=7, place=6, choose=6, discover=6, hide=5, cast=5, put=5, hear=5, arrive=5, summon=5, defeat=5, recruit=4, meet=4, possess=4, provide=4, narrate=4, was once=4, destroy=4, confront=4, see=4, help=4, allow=4, order=4, offer=4, begin=4, air=3, go=3, agree=3, entrust=3, learn=3, lives backwards=3, inform=3, argue=3, hold=3, imprison=3, return=3, fall=3, kill=3, instruct=3, live=3, warn=3, appoint=3, die=3, conjure=3, begging to die=2, associate=2, search=2, drink=2, serve=2, convince=2, face=2, subdue=2, witness=2, transform=2, look=2, seek=2, prophesy=2, change=2, lead=2, watch=2, battle=2, advertise=2, catch=2, interpret=2, revive=2, throw=2, returned to punish=2, ally=2, locate=2, rush=2, mature=2, approach=2, involve=2, avoid=2, sacrifice=2, flee=2, announce=2, acquire=2, remind=2, save=2, travel=2, request=2, hire=2, &#8230;}</p></blockquote>
<p style="text-align: left;">We then use LCC&#8217;s dependency parsers in order to expand each of these predicates into a set of semantically-typed triplets.  We then use submit these triplets to a modeling framework to learn the triplets which are most relevant for the individual entity, the entity type, or the user&#8217;s query:</p>
<p style="text-align: left;"><em>Richard Petty:<br />
</em></p>
<ul style="text-align: left;">
<li>#driver &#8211; win &#8211; #race, #person &#8211; win &#8211; #raceEvent</li>
<li>#driver &#8211; take &#8211; #ordinalNumber</li>
<li>#driver &#8211; passed &#8211; #driver</li>
<li>&#8230;</li>
</ul>
<p style="text-align: left;"><em>#Wizards:</em></p>
<ul style="text-align: left;">
<li>#wizard &#8211; cast &#8211; #spell</li>
<li>#wizard &#8211; killed &#8211; {#person, #monster}</li>
<li>#wizard &#8211; suffered &#8211; #quantity</li>
<li>&#8230;</li>
</ul>
<p style="text-align: left;">We then use the output of this modeling to pick out Q&amp;As from Swingly&#8217;s index which are expected to be most relevant with respect to the query.  This means that for a generic query like <em>Richard Petty</em>, the top Q&amp;As that Swingly returns goes from:</p>
<ul style="text-align: left;">
<li>Where can I buy Richard Petty die-cast replica cars?</li>
<li>Which number did NASCAR retire in honor of Richard Petty?</li>
<li>What type of car did Richard Petty race in 1964?</li>
</ul>
<p style="text-align: left;">to</p>
<ul style="text-align: left;">
<li>What racing championship did Richard Petty win 7 times?</li>
<li>What races did Richard Petty win more than once?</li>
<li>How many races did Richard Petty win over his career?</li>
</ul>
<p style="text-align: left;">Have we eliminated the pernicious problem of search overload?  Nah. (Well, not yet.)  But we expect techniques like these will be able to push high-quality, relevant content to the top of search results &#8212; even when users don&#8217;t give us enough information to figure out what they&#8217;re really looking for.</p>
<p style="text-align: left;"><em>Author&#8217;s Note:  &#8220;Search Overload&#8221; is one of the taglines used in the Bing marketing campaign. I like Bing.  And anything that&#8217;s going to reduce the amount of crap I have to deal with from my search engine.<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://andyhickl.com/2009/08/05/avoiding-search-overload/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
