Problem with Enterprise Search- new veriosn of GSA launched

Google has launched new version of GSA (Google Search Appliance) meant for deployment within the firewall to do enterprise searches. There are some improvements built inside yellow box which is how standard GSA is shipped. Google claims that GSA can now index many million more pages, can conduct searches faster than ever and show role based search results. So for so good. The problem is that enterprise search is not about searching millions of pages or spitting query results a bit faster. The problem is that enterprise search engines don't work. The real problem is that it is not easy to find anything relevant. And the problem is not with search engine like GSA. The real problem is in very nature of enterprise and its content management and intranet systems.

I have worked a bit with GSA sometime back. It is extremely easy to install and use. Mostly, you point it to sites and sources you want to index, trigger its crawler, let it build its index and there you start searching with familiar Google UI, all out of box. Google has spent a decade perfecting search algorithms. It uses complex rules to bubble up the most relevant results for a user query on top of result pages. Page rank or relevancy of a page is calculated based on how page is cited by other pages, their interlinking and metadata which crawlers extracts from page elements like title, header and other structural elements (typical SEO domain). And it works so well on web because web has huge scale. Web is huge mass of interlinking web pages where cross referencing through links is a norm. This Interlinking of pages is most important factor for determining relevancy of result pages for a query.

When I say that enterprise search doesn’t work, what I mean is that results which search engines throw are hardly relevant. Unlike on web search where first few results are almost always most relevant to a query, in enterprise search, it is hardly true. You can get results which are no where relevant to your query and most important and relevant documents might be buried somewhere in huge pile of results, where you would never reach. The reason is simple, that enterprise systems and intranets don’t use cross-linking of pages and sources. There no almost non existent cross references. For example how many times would you find a "Banking Industry opportunity PoV" document or page cross referencing to another document which could be "Banking Industry Trend Analysis"!! In fact, many enterprise systems are like document storage systems where all the documents in form of excels, ppts, pdfs are dumped as equal. Search engine has no way to find which most relevant document for a query is. On top of that, most enterprise systems and intranets are not optimized for search engines. How many times, have you seen web pages on intranet which have no title or use proper Meta data!! The content creators don’t use some basic practices to make their content “findable”. So another vital source for search engine to determine page relevancy is lost. And what we get in effect is a sputtering and struggling search engine trying hard to fish out that most important document for you.

In a way, it is not problem with technology but with very nature and realities of enterprise. IMHO, an effective intranet search engine has to provide more than "out-of- box" features; it should be "tweak-able". It has to work with understanding that:

  1. Enterprise content is not cross linked and cross referenced, so relevancy logic successful on web wouldn't be much useful.
  2. Scale of content is limited, unlike web where million of pages are cross-linking, enterprise content is not so vast.
  3. Pages are not optimized for search engines. (But that should be fixed by a company's policy)
  4. Some documents written by some "experts" or “communities” in enterprise become naturally more important or relevant! The relevancy logic has to account for that, but how! Engine administrators should be able to feed new relevancy rules into the engine.
  5. User ratings of documents and META -TAGS on web pages should be given more weightage in calculating relevancy. On web, these are mostly ignored due to their misuse for search engine spamming, but this is not the case in enterprise where problem is opposite.


From another perspective, a human edited search engine could be more useful and effective within firewall. Automated search engine can still be used to find out what are users searching most of time (trends), and experts, knowledge managers or users can contribute to index and ranking of pages manually. New version of GSA also seems to have a similar feature for Do It Yourself (DIY) key-match and some features for administrators to influence search results.

Prayer for K2 Climbers

Ever since I have been on trek to ABC in Nepalese Himalayas, I have become fascinated with mountain climbing. It is perhaps toughest challenge any man can go through, both physically and mentally. Those tall irreverent himalyan mountains allow only the fittest and strongest of men to climb them and I have nothing but respect for men who indulge in this dangerous activity. Mountain climbing community is very small and plugging to it gives a different perspective on life. It is a different world altogether. Ever Since I read Maurice Herzog's classic book Annapurna detailing Frenchman's first and successful ascent of a 8000 meter peek, I have peeked into fascinating world of Louis Lachenal, Lionel Terray, Gaston Rebuffat and many such legendry climbers. The community has its own hierarchy with toughest, high altitude climbers on top. There were climbers like Anatoli Boukreev, who was daredevil to the extent of being mad, climbing without acclimatization or without oxygen. He was also climbing during 1996's Everest tragedy when 8 climbers were killed in one day. The expedition created lots of controversy with Jon kraukauer , a journalist and climber partly blaming Boukreev for not saving some of climbers trapped in snow storm. An Indian climber who died on same tragic day on Everest, was still lying dead just beneath the peak until recently. For almost 10 years his dead body was used by climbers as a milestone (called greenboots) on way to Summit. In recent times, many controversies and debates have arisen where some people within community have decried the attitude of climbers to reach the summit at any cost and not helping or rescuing other climbers who might be in distress. In 2006, David Sharp, a British Climber laid dying on the way to Everest peak, even while 40 other climbers passed him without offering any help or rescue. There is someting in us, men, which is tested only when faced with extreme adversity like in case of Joe Simpson and Simon Yates's disasterous climb on a Peruvian mountain so brilliantly captured in book and movie Touching the Void.


Even though men would often in their follies term their successful climbs and summits as conquests, these mountains invariably keep reminding them who the real lord is .Freak weather, falling stones, sudden avalanches, extreme weather, anything can kill even most experienced and trained climber as happened on K2 yesterday. In deadliest day in K2's history ,11 climbers reportedly died while scaling K2. K2 is said to be the most difficult of mountains to climb with its razor sharp ridges and unpredictable weather. The deadly mountain is also said to be cursed for women as only 5 women could reach the top, 3 of them died on decent and rest 2 died later on other mountains. Regardless of obvious danger, K2 keeps attracting climbers like bees to the flame. It is said to be a handsome and mighty mountain which is also equally unforgiving. Here is little silent prayer for those who died scaling it yesterday.

About Me



Some questions used to have simple answers......(roll mouse over picture)