Author Box
Articles Categories
All Categories
Articles Resources

Spider Webs, Bow Ties, Scale-Free Networks, And The Deep Web

May 25, 2012 | Comments: 0 | Views: 150

The World Wide Web conjures up images of a giant spider web where everything is connected to everything else in a random pattern and you can go from one edge of the web to another by just following the right links. Theoretically, that's what makes the web different from of typical index system: You can follow hyperlinks from one page to another. In the "small world" theory of the web, every web page is thought to be separated from any other Web page by an average of about 19 clicks. In 1968, socologist Stanley Milgram invented small-world theory for social networks by noting that every human was separated from any other human by only six degree of separation. On the Web, the small world theory was supported by early research on a small sampling of web sites. But research conducted jointly by scientists at IBM, Compaq, and Alta Vista found something entirely different. These scientists used a web crawler to identify 200 million Web pages and follow 1.5 billion links on these pages.

The researcher discovered that the web was not like a spider web at all, but rather like a bow tie. The bow-tie Web had a " strong connected component" (SCC) composed of about 56 million Web pages. On the right side of the bow tie was a set of 44 million OUT pages that you could get from the center, but could not return to the center from. OUT pages tended to be corporate intranet and other web sites pages that are designed to trap you at the site when you land. On the left side of the bow tie was a set of 44 million IN pages from which you could get to the center, but that you could not travel to from the center. These were recently created pages that had not yet been linked to many centre pages. In addition, 43 million pages were classified as " tendrils" pages that did not link to the center and could not be linked to from the center. However, the tendril pages were sometimes linked to IN and/or OUT pages. Occasionally, tendrils linked to one another without passing through the center ( these are called "tubes"). Finally, there were 16 million pages totally disconnected from everything.

Further evidence for the non-random and structured nature of the Web is provided in research performed by Albert-Lazlo Barabasi at the University of Notre Dame. Barabasi's Team found that far from being a random, exponentially exploding network of 50 billion Web pages, activity on the Web was actually highly concentrated in "very-connected super nodes" that provided the connectivity to less well-connected nodes. Barabasi dubbed this type of network a "scale-free" network and found parallels in the growth of cancers, diseases transmission, and computer viruses. As its turns out, scale-free networks are highly vulnerable to destruction: Destroy their super nodes and transmission of messages breaks down rapidly. On the upside, if you are a marketer trying to "spread the message" about your products, place your products on one of the super nodes and watch the news spread. Or build super nodes and attract a huge audience.

Thus the picture of the web that emerges from this research is quite different from earlier reports. The notion that most pairs of web pages are separated by a handful of links, almost always under 20, and that the number of connections would grow exponentially with the size of the web, is not supported. In fact, there is a 75% chance that there is no path from one randomly chosen page to another. With this knowledge, it now becomes clear why the most advanced web search engines only index a very small percentage of all web pages, and only about 2% of the overall population of internet hosts(about 400 million). Search engines cannot find most web sites because their pages are not well-connected or linked to the central core of the web. Another important finding is the identification of a "deep web" composed of over 900 billion web pages are not easily accessible to web crawlers that most search engine companies use. Instead, these pages are either proprietary (not available to crawlers and non-subscribers) like the pages of (the Wall Street Journal) or are not easily available from web pages. In the last few years newer search engines (such as the medical search engine Mammaheath) and older ones such as yahoo have been revised to search the deep web. Because e-commerce revenues in part depend on customers being able to find a web site using search engines, web site managers need to take steps to ensure their web pages are part of the connected central core, or "super nodes" of the web. One way to do this is to make sure the site has as many links as possible to and from other relevant sites, especially to other sites within the SCC.

For more information visit

Source: EzineArticles
Was this Helpful ?

Rate this Article

Article Tags:

Web Pages


Web Sites


Search Engines

Windows 8 will change the way you work and play, says Microsoft and we agree. Nearly all of the previous iterations of the Windows OS have been evolutionary but Windows 8 is set to change all that.

By: Sakshi Sharma l Computers & Technology > Mobile Computing l April 03, 2013 lViews: 660

Many antivirus programs available today have various features but it entirely depends on the user to make the best choice from among these different computer virus protection programs online. Before

By: Alex l Computers & Technology > Software l December 28, 2012 lViews: 369

You can add a new color to your entertainment life with iskysoft video converter for mac & iskysoft dvd creator for mac and make your Christmas holidays all the more special. iskysoft video

By: Zaithyn Galter l Computers & Technology > Software l December 23, 2012 lViews: 1070

The choices you make regarding the type of recruiting software you choose to use are important. Being an informed consumer is essential to getting a system or components that complement systems of

By: Maria Warne l Computers & Technology > Software l December 14, 2012 lViews: 305

The Cisco certification has become the most popular IT training in recent days and it offers three different levels of these certifications include as: associate, professional and expert-level.

By: sandidas chakma l Computers & Technology > Certification Tests l December 11, 2012 lViews: 236

Millions of prospective 12th standard students are gearing up their preparations for the Joint Entrance Exam for engineering seats. From 2013, it has been decided by the IITs, CBSE, JEE organizing

By: Sarkariexam l Computers & Technology > Certification Tests l December 07, 2012 lViews: 633

Users of the game Diablo III have had many of their online valuables wrongfully stolen from them. Some of the valuables consist of online currency and precious hard to obtain gear.

By: David Kyl Computers & Technology > Computer Forensicsl June 21, 2012 lViews: 156

SCADA typically refers to computer based industrial control system which basically aids to monitor and control facility based industrial processes and infrastructure. Here industrial processes

By: Ananta Modakl Computers & Technology > Computer Forensicsl June 20, 2012 lViews: 182

The reason why the PCI DSS is often seen as overly prescriptive and over-bearing in its demands for so much security process is that card data theft still happens on a daily basis. What's more

By: Mark Kedgleyl Computers & Technology > Computer Forensicsl June 20, 2012 lViews: 149

An important part of an private investigation can be preliminarily accomplished with nothing more than a telephone, reliable laptop and an Internet connection. Conducting thorough and successful

By: Joseph C Gioconda, Esql Computers & Technology > Computer Forensicsl June 14, 2012 lViews: 252

Since the birth of computers and computer crimes, a relatively new field called computer forensics has served to expertly retrieve data or evidence from these devices. There are various computer

By: Samora Jinqual Computers & Technology > Computer Forensicsl June 13, 2012 lViews: 156

The best SEO companies provide e-marketing services. These firms train online businesses specifically small-scale proprietors to sell their merchandise in a highly cost effective way to intended

By: Roy Allensl Computers & Technology > Computer Forensicsl June 12, 2012 lViews: 152

Discuss this Article

comments powered by Disqus