Friday, August 10, 2012

from complicated to complex T@O

Internet nodes and routes mapping
Scientific apparatus offers a window to knowledge,
but as they grow more elaborate,
scientists spend ever more time washing the windows.
Isaac Asimov

For systems with a very high number of networked elements with many dynamic processes which interact each other the main distinction is between complicated and complex systems.
Generally a system may be very complicated but not complex, while a relatively simple system may exhibit high complexity. Furthermore if in a system, simple or complicated, one introduces just one complex element or process then the whole system becomes so.

As an example of emergence of complexity in a very complicated system it is possible to use the most complicated one existing today, Internet.
  •  Complicated
Network structure
As a telecommunications network Internet has a partially or totally meshed structure, where the elements, the network nodes (estimated in approximately 100 millions known in 2006), are partially directly connected among them. The networked connecting and control structure is an evolution of the hierarchical structure of the type:

 which represents the maximum possible degree of complexity - in this case of complication:

from:Yaneer Bar-Yam, Introducing Complex Systems
    Fundamentals
    Internet, as any communication system, has its basis in the information theory founded by Claude Shannon in 1948:

    In his study he defines the theoretical basis of description applicable to any communication channels. Shannon refers to the following quite general model:

    where a message should be transmitted from the information source to the left to the destination to the right. The communication channel involves the message encoding in a signal by a transmitter, the transmission through a physical media and the receiving by a receiver, which decodes it delivering to the destination. The received signal generally is not equal to the transmitted one due to distortion and random noise added by the channel. Shannon's theory defines the meaningful parameters and the transmissive channel capacity and is totally general for any transmission system, from smoke signals of native americans to voice/fax telephone networks, from Assyrian-Babylonians scripts to global data networks such Internet.
    The second basis is technological, being represented from the extensive research conducted for about two decades since the 70s by all the telephone companies of the world, with several other centres and universities, to develop optical fiber communication systems, a research which involved investments for tens of billions of dollars and that - perhaps unique - it can be considered completed forever. The acknowledgement of this development has been formalized awarding to one of the pioneers, Charles K. Kao, the 2009 Nobel prize for physics "for groundbreaking achievements concerning the transmission of light in fibers for optical communication";

    Proc. IEE, 1966
    Systems with transmission capacities up to tens of thousands of billions of bits per second (bit/s) have been demonstrated:

    and about 420.000 Km of intercontinental submarine fiber cables have been installed and deployed to 2006:

    Communication
    Currently Internet is a composition of public (such the WWW - World Wide Web, the service that historically determined the explosion of Internet, invented conceived and designed in 1990-91 by Tim Berners-Lee and Robert Cailliau at CERN) and private networks (intranet) which has by now an almost unknowable number of network equipments and subscribers - estimated of the order of billions; for example the number of subscribers in the 2011 was estimable in around 2 billions.

    A small portion of the Web surrounding the Wikipedia website
     In its most synthetic form Internet may be represented as:
    where two users are represented linked by any fixed or mobile device, which is considered by the network as host,  and which use two classical services: the first is browsing of a webpage through a web server in a classical exchange modality of client/server type; the second can be a chat or an email exchange or a peer-to-peer (P2P) link between the two users. Internet is represented by a cloud, indicating that at this level one does not enter into details on how these services and links are made.
    The fundamental distinctions between a telecommunication network such Internet and the traditional telephony are between the transport of the information signal in analog or digital form and in the information transfer mode, divided in packet or circuit switching.
    In the traditional plain old telephone network (POTS) the transmitted signal is of analog type, eventually sampled, that is digitally converted, and the linking between the two users is made by switching based on the called telephone number with a physical circuit obtained - in an historical progression - manually, mechanically, electro-mechanically or electronically. This type of switching is indicated as circuit switching, since the two users are directly linked by a circuit.

    Hierarchical protocol stack
    In data network such Internet the transmitted signal is of digital (or discrete) type, divided and collected by determined rules through a communication protocol in a set of digital data (a sequence of "1" and "0" bits which encodes information) of variable or fixed length called a data packet, where the link between the two users is made reading some specified fields of the data packet called source and destination addresses, which allow the network to determine to who transfer to and back the information, according to a communication packet  switching type.
    In the case of Internet the communication protocol which makes packet-switching is known as TCP/IP suite, a set of several tens of protocols used to establish connectivity and manage the network equipments; in particular the interconnecting protocol is defined as Internet Protocol (IP) and structured according to a hierarchical stack of type:
    The seven levels to the left are stacked according to an historical reference classification named ISO/OSI model; those to the right are actually operative in the network ans define the the hierarchical structure of, from level 1 physical to level 4 of transport to level 7 of application. The structuring by levels of the communication protocol, which reflects directly on the hardware architecture, is one of the points - together with the network structure with a very high number of elements - which drive a system like Internet to border complexity.
    The use of the TCP/IP protocol to connect the network hosts is schematized in the following two figures. In the first:
    is illustrated how data packets are organized in the different protocol levels. At the highest level the host demands an information, for example the browsing of a webpage by a browser; this demand is passed to the application level - in this case http - which adds in the first part of the data packet a specific packet called header, passing it to next transport level where the TCP adds in turn a specific to pass to next network level where the IP protocol adds a further header to arrive to the data link layer where, in the example, the layer 2 protocol Ethernet (eth) adds a final header. At this point the overall packet is passed to the physical level with a specific protocol, for example the same eth.
    The use of the
    TCP/IP protocol stack for the connectivity between the two users which use a network host is as follow:
    where user A by host A requires a link with host B with user B. The host A encapsulate according to the previous figure the data packets, these are transmitted through the network, transit on several zones (for example net1 and net3 can be access areas and net2 a long-distance transport area) addressed by the equipment which work specifically of routing  the packets through the different networks, named switch at level 2 data link or routers at level 3 IP network, which play the same role of telephone switches in the circuit switched networks, such traditional telephony. Delivered the overall packet to the host B network interface this is decapsulated with an inverse procedure followed by A to arrive to the application used by the hosts. The transition from the host level - hardware - to the users level - humans - marks the border between the complicated network system to the complex interactive social/virtual human.
    The connectivity at geographical level and the transport modalities of a packet along the network are illustrated as:
    where host A belongs to LAN ethernet local area network - typically a business situation - which send the eth packet to dedicated switches at this level; this part of the network is the one named access, which converges to that of transport, where the packet is brought to the IP level and routed and, with an inverse process, delivered to the destination host B.
    The protocol structure of the IP set is rather complex, as illustrated in the following:
    where some of the main protocols and applications used over the 5 network levels are summarized. At the application layer IP provides the basic applications for the elementary services, like http for browsing, smtp for email, ftp for files transfer etc. At transport layer 4 there are the TCP and UDP protocols which manage specific points like packets check, flow control and congestion. At network layer 3 works the IP protocol specifically dedicated for interconnection inside the network. At this level there are also a set of routing protocols, like BGP and OSPF, which allow the routers to exchange information among them to build routing tables which allow the knowledge of the network topology to the routers for a correct routing toward the destination. At the data link layer 2 there are protocols either for the accessd network like ethernet, ATM - a still existing past protocol no more deployed - and the Point-to-Point_Protocol (PPP) - used to connect residential users with ADSL through the conventional telephonic line, and protocols used in long-distance core/transport network, such MPLS, which has been added to IP to improve the Quality of Service (QoS) and allow traffic engineering which plain IP does not allow, specifically for voice transport as Voice over IP (VoIP), one of the fundamental services in modern Ip networks where the telephone traffic is digitalized and transported as IP packets. At this level there are protocols for long-distance transport, typically by fiber optics, as SDH - in Europe - and SONET - in North America. Finally at the physical layer 1 there is all the hardware used as transmissive media, from telephone lines fixed or mobile to high frequency HFC cables for network access to optical fiber systems and satellite links for transport inside the network. 
    A more detailed schematic view of the access and transport/core architecture for several users typologies is:
    where are shown the access network to the left and the long-distance transport-core network to the right; between the two there is a Point of Presence (POP) which interconnects the two networks and collects and distributes traffic at regional and national level, distinguished in primary POPs, connected by optical high bandwidth links, and secondary POPs which collect user areas and are connected to the primaries; for example in nations like Italy, France, Germany and United Kingdom the primary POPs may be some units and the secondaries some tens, each with some millions of users. The users types are commonly divided by residential mass-market and business users. The first includes traditional telephony collected by the POTS network and transformed into IP datagrams, the residential users with ADSL line collected by DSLAM equipments and mobile devices users collected by the mobile network. These typologies are then collected by a metro MAN ethernet aggregation network connected to the access router (BNAS) of the metropolitan POP. The business users with their LANs are connected to the POP through two specific switches or routers, one user-side, Customer Edge (CE), and one POP's Internet Service Provider side, Provider Edge (PE). The business traffic is then generally associated to a virtual private network (VPN)  which allow institution or organizations with a wide distribution of locations or branches (typically government offices and banks) the possibility to have a private intranet where nodes are linked through the public national network.
    The access traffic entered in the POP (if not local) goes to the transport network, with high-end routers (Giga and Tera routers) with processing capabilities of the order from billions to hundreds of billions of bits per second, and typically transmitted through optical fiber rings with the SDH/SONET protocol by ADM equipments; in the high-traffic national links is also possible to transmit several optical channels into the fiber by different wavelengths using DWDM technologies; for example there are some DWDM systems where a single optical fiber is capable to carry all the fix-to-fix world calls. If the link required by the user is international - typically the browsing of a webpage or a peering by a Server Farm - specific national network POPs connect and exchange toward international networks, the union of which is globally named Internet or WWW for the public portion. Furthermore the national ISP network manager should provide to collect voice and data traffic coming from other fixed or mobile telecommunication operators (Other Licensed Operators - OLO) through an Internet Exchange Point (IX - IXP).
    The national network is divided, by the routing protocols, in several areas at national level (by OSPF) and autonomous systems (AS) at international level (by BGP), named also domains. At the global level Internet is representable also as the union of the existing domains:
    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy
    Network evolutions, involutions and coevolutions
    The Internet schematic evolution from the 70s to the medium and long term future can be shown as:

    the historical evolution of the network, particularly of the level 3 IP protocol, cannot take into account in the 70-80 years of its enormous growth at world level. This has resulted in several critical present points and present and future developments among which:
    the IP protocol was developed having in mind small-medium networks dimensions, typically for connecting some tens of hosts between metropolitan locations, and certainly not those of billions of machines. In particular, the two address fields (destination and source) of the protocol have been codified at 32 bits, which allows an address space of 232 -1, that is about 4,3 billions, a number considered at time largely sufficient, if not overestimated, compared to the estimated developments, mainly of research. These have been further divided in classes which further limited the number of available and assignable addresses, distributed by the IANA ICANN - Internet Assigned Numbers Authority - which, in the first years, assigned them in a easy and free way to those few entities which required them. Starting from the early 90s, with the Web development and the beginning of commercial applications, the explosion of the global demand of classes of IP addresses from ISPs and agencies of any kind is bringing to the saturation and exhaustion of available addresses, as shown in the following figure which reports the global Internet routing table as aggregated routes announced by the BGP protocol, around 400.000 at present.
    Internet global routing table: BGP Routing Information Base entries to date 
    To remedy this situation from 1998 an evolution of the protocol has been proposed and developed named IP version 6 (IPv6), as an updated release of the present version, named IP version 4 (IPv4). With IPv6, besides to introduce a number of improvements compared to IPv4, the address field is codified over 128 bits, that is an address space of 2128 , about 3,4 × 1038; the typical example which shows the difference is to note that for any square meter of the earth's surface there are 655.570.793.348.866.943.898.599 IPv6 unique addresses (that is 655.571 billions of billions), but only 0,000007 IPv4 (that is only 7 IPv4 every million of square meters). If IPv6 resolves forever the assignable address number problem it raises another very serious, that IPv6 and IPv4 are not compatible with each other (if nothing else for the different dimension of the address field), whereby the introduction of IPv6 should provide a migration between the two protocols by several proposed solutions, which choice is reserved to ISPs and institutions which, necessarily if they grow and deplete their IPv4 addresses, should switch to IPv6.

    Internet is born and historically developed as an overlay of the telephone network, where data transmission was made by a modem which allowed to transmit, in a limited way, digital data over lines designed for analog voice. With IP networks development the situation is reversed, currently telephony is becoming increasingly a downlay of Internet, starting from higher stages down to lowers - international, national, extraurban and urban. To carry the analog voice digitally converted over a network designed for data as VoIP raises a number of problems, first of all that to introduce into the network a Quality of Service (QoS) that strongly privileges voice real-time, particularly in term of delay and packets loss compared to other services; as an example, if a webpage loading takes some seconds the user does not experience any particular malfunction, while if applied to an audio/video streaming such as YouTube it becomes difficult and if applied to a phone conversation makes it unintelligible. For this reason several technologies have been added to the IP protocol, which alone does not solve the question since at time it was not designed for voice transport, as new protocols at the interface of level 3 IP and the 2 Data Link such MPLS, which intrinsically have been designed to privilege the stringent requirements of voice compared to streaming and browsing.

    Finally Internet development until around 2000 was mainly related to connectivity, that is in a way that all the network resources would be always reachable, a target largely achieved today. After connectivity is taken for granted and user-transparent then the next step is based on the fact that the network is largely used for information search and retrieval, mainly for multimedia files of text, music, video and so on. Following the example of peer-to-peer and file sharing applications like eMule and BitTorrent- which now are an Internet overlay - the user search for information on a content base and for where this content resides. Therefore the development is toward a distributed network intelligence based on connectivity which makes a content-based network, Content Delivery Network (CDN), where the present P2P overlay follows the previous example of Internet over the telephone network and incorporate it.
    • from Complicated to Complex: Dynamical Networks
    The high number of nodes, intended as domains/autonomous systems (groups of routers), routers and hosts, the intrinsic presence of a protocol hierarchy in data transmission and the meshed network geographically distributed makes globally Internet very close to a complex system.

    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy
    At this level Internet and the WWW have been investigated for their statistical properties related to their structure and connectivity. For example in a 2007 study has been determined that either the incoming and outgoing probability in a hypertext link document inserted in a web URL and the probability on the links number that, in a router's domain, one of these has with the others either follows a power law distribution, characterized by a scale invariance :

    Ref.: A. Barabási: "The Architecture of Complexity", IEEE Control System Magazine (2007)
    The dynamic network structure is shared by a large number of systems, from the relational structure of family systems (in the example the Medici family in renaissance Florence):

    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy.
    to proteic networks generation which interact in the cellular cycle in molecular biology:

     to metabolic networks:
    Ref.: G. Cardarelli: "Lectures on Complex Networks", (2008)
    to drug users networks:

    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy.
    to the network of ingredients for a cooking recipe:

    Portion (full) of an ingredients network for a cooking recipe.
    From: L. Adamic et al., "Recipe recommendation using ingredient networks", WebSci 2011, ArXiv.org.
    to end with the most complex system of all: the human brain, a system with evident  emergent phenomena made in the human cerebral cortex by about one hundred billions neurons, with about one hundred thousand billions synaptic interconnections and composed by at least two networks, one axonal, with middle and long-range links and the other denditric, considered of short-distance.
    A typical cerebral interconnection among different areas is of the type:
    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy.
    with particular dynamics of interaction, for example the enhancement of synchronization in a local neuronal network due to the addition of long-range interneurons:
    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy.
    •  Complex: Social Networks and Virtual Communities
    Networks on networks on networks: virtual communities over users over Internet
    The Internet full complexity is reached when users are added, which interact in a virtual relation equivalent to the "real" one except the impossibility of body contact. The study of social dynamics and information and opinion diffusion through Internet has been particularly researched in the virtual communities environment, for example on Social Sites, like Second Life, Facebook, Twitter and so, which emulate social network. The situation can be described as emergent social networks from emergent virtual networks from a global physical-logical network, Internet.
    Ref.: S. Boccaletti et al.: "Complex networks: Structure and dynamics", Physics Reports (2006) 
     Cactus Group, Catania, Italy.
    A recent study on information diffusion in a virtual community with some results about the kind of the established relationship has been obtained diffusing and studying the propagation of a meme on Facebook of the type:
    Do any of us really know everybody on our friend list?
    Here is a task for you. I want all my fb friends to comment
    on this status about how you met me. After you
    comment, copy this to your status so I can do the same.
    You will be amazed at the results you get in 12 hours.

    its diffusion from July 6 to 9, 2010 has been:
    A brief portion of the diffusion of the meme. Nodes are users who posted their meme as their Facebook status. Edges are drawn between nodes if one user commented on the meme post of another. The colors denote the time at which the status update was posted, starting on the 6th of July 2010 (red), and ending 9th July 2010 (blue).
    The meme continued propagating past this point, eventually reaching millions of users.
    Ref.: Adamic and FB: "How you met me", (2012).

    providing various other data as popularity over time, the age and location of distributors which contribute to describe the establishment of a "friendship" relation typical of Facebook. In another 2011 study the observed relation between the virtual network structure evolution and the information it carries has been modeled, showing that the network structure alone can be extremely revealing about the diversity and the novelty of information and contents which are communicated. Networks with a higher conductance in link structure exhibit higher information entropy, or higher information disorder, while unexpected network configurations can be tied to information novelty, for example an online user which announces a scoop - true or false - rapidly increase his fan's group network for a certain period of time.





     The Cooperative Association for Internet Data Analysis


    The Internet Mapping Project







    Guido Caldarelli