Information Retrieval from Social Network
Although there is an increasing interest about social networks in general, there is little attention about the application of social network analysis to information retrieval systems Social networks is a online platform which is used by numerous people to build their social relation either for personal or professional purpose. Being a integral part of life to share and propagate information instantly. This service allow the users to share their interest on a particular field and getting reviews and feedback from their colleagues in form of comments, like, dislike etc. Since the users are becoming active participants significantly, the nature of the information also getting change day by day like from static data to noisy dynamic multimedia. We can classify the content based information into two categories namely textual content based information and visual content based information. The dynamic information is kind of unstructured user defined information which cannot be remembered by any human being because of very limited memory either in his/ her brain or less physical space in form of document or any cloud, thus administration of this data is a very Complex task. In this paper, the goal is analyzing, organizing, indexing, building and retrieving information from dynamic unstructured form to make sensible, relevant and precise knowledge. Then we discuss the major efforts made for enhancing the IR process with information coming from social networks, a process called Social Information Retrieval (SIR).
Social information retrieval (SIR), User defined information, Content, Noisy, Dynamic multimedia, unstructured data, social network analysis (SNA),PostINTRODUCTION
The emergence of social media has evolved from a static Web where users were only able to consume information, to a Web where users are also able to produce information that means social media letting the user to become co-creator. Social media facilitates the user to become interactive with other user with similar belongings either in terms of taste or resources. People around the globe are getting fascinated for social media very impact fully as it provides the real time event based contents. The collaborative tasks such as interaction, communication via messages, post and comments, sharing data (photos and videos) make users more active in generating content. These factors are responsible for the increasingly growing quantity of available data. In such a context, a crucial problem is to enable users to find relevant information with respect to their interests and needs. This task is commonly referred to as Information Retrieval (IR). IR is performed every day in an obvious way over the Web. However there is no proposed models that directly allow the user to extract information from the social networks itself. Thus a no of different existing algorithms are used. As all we know human activity is almost goal-oriented. Goals tends to offer a approach for modeling and organizing the information to find the relevant knowledge. This challenge requires a tool that has the capability to manage information in such a way that whenever a user needs some personal information, he can retrieve it with a simple query. This paper tries to answer the following research questions: i) what are the different familiar social networking services? ii) What are the different social network based information? iii) The necessity of management and organizing of social network information? iv) The process to extract the relevant information? v) What are the major challenges in social network based information extraction?
DIFFERENT SERVICES ON SOCIAL NETWORKING
Here the familiar social networks and their usages are described below:
Interaction based networks- To begin with, social networking sites such as Facebook, LinkedIn, and Twitter emerge as the best platforms to connect people. Here, like-minded people are able to share information and ideas to push their knowledge
Media Sharing Networks- Another type of network that has garnered popularity in the recent times is media sharing networks such as Snapchat, Instagram, and YouTube. Basically, these are meant for users to share their photos, videos and other media with fellow users.
Discussion Forums-Discussion forums such as Quora and Reddit bring together users with similar interests to discuss their opinions and share information as well.
Social Shopping Networks- Consumers join social shopping networks such as Amazon, Flipkart not only to buy products but also to follow their favorite brands, spot the latest trends and share their finds with fellow members.
Blogging and Publishing Networks- WordPress, Medium, and Tumblr are some of the renowned blogging and publishing networks where users can discover and publish content.
CHALLENGES IN SOCIAL NETWORKING SITES
plenty of context (e.g., publication timestamp, relationships between users, user profiles, comments)
short posts (e.g., on Twitter), colloquial/cryptic language
spam (e.g., splogs, fake accounts,FAKE INFORMATION)
up-to-date content – real-world events covered as they happen ?
high update rates pose severe engineering challenges (e.g., how to maintain indexes and collection statistics)
SOCIAL NETWORK ANALYSIS (SNA)
“Social network can be termed as a computer network.” A social network can be expressed as collection of nodes and edges where the nodes represent people and the lines between nodes, called edges, represent social connections between them, such as friendship or working together on a project. To understand the underlying social structures and phenomena, asset of techniques and methods exist, which are known as social networks analysis (SNA) techniques 9.
SOCIAL NETWORK ANALYSIS leads to retrieval of User Defined Information from the following areas:
Image and location-based information, including shared photos and check-in venues;
Topic-based information, such as tweets, discussion forums, and community question and-answer forums;
Application-based information covering local mobile apps and associated information and discussions;
Structured information, including cultural and historical information.
Social Network analysis provides the various Tasks
? Post retrieval identifies posts relevant to a specific information need (e.g., how is life in Iceland?)
? Opinion retrieval finds posts relevant to a specific named entity (e.g., a company or celebrity) which express an opinion about it
? Feed distillation identifies feeds relevant to a topic, so that the user can subscribe to their posts
? Top-story identification leverages social media to determine the most important news stories (e.g., to display on front page)
Classification of information sources for social network extraction
With the definition described above a variety of different types of social connections between the nodes (actors) of this additional social layer shown figure 4 are possible:
1: Explicit Direct social connection:
Hyperlinks that are explicitly meant to reflect social relations. An example for that are social network sites like Facebook, LinkedIn, MySpace or Orkut with millions of users who maintain a personal profile with a Friends list in order to interact and communicate with them. The “public display of connection” is perceived as an important identity signal and is being used to maintain impression management. Another example is the Friend-of-a-Friend (FOAF) protocol, that explicitly expresses relationships between persons.
2: Explicit indirect social connection:
Hyperlinks within any web page which link to other web sites. This connection is related to the above mentioned hyperlink network, but differs for instance in a way that the emergence of blogs has lead to more personal/social meaning of links (eg. blogroll).
3: Implicit direct social connection:
Connections extracted from textual information found on a web page, which clearly indicate a social relation between the different actors. An example for such a connection is co-authorship in scientific publications, as they increasingly become more accessible through the World Wide Web. The co-authorship found in these documents can be used to extract a social network of authors .4: Implicit indirect social connection:
Connections extracted on the content level of a web page, which might indicate a vague social relation between the actors. An example is the collaborative filtering from amazon. Krebs argued that in choosing a book as a focal node and assuming that books indirectly represent people who buy them, the resulting emerging network can be seen as social network. Another example are citation networks within scientific publications.
Problems related to social network extraction
Apart from the distinction of different information sources to extract and create social networks from the World Wide Web there are several problems related to the extraction of social networks from the various information sources available on the World Wide Web. First, a general problem is the identification of persons because of different naming standards or same names for different persons. Second, the social context and the type of social interactions of the authors within these information sources need to be carefully analyzed in order to obtain a meaningful understanding of the underlying social network structure.
1: Author and relation identification:
The extraction of social networks crucially depends on the successful recognition of person names. Names are the most essential information to identify an individual that act as a network node and are consequently needed to identify the corresponding relations between network nodes. The problems of person extraction are manifold as names are ambiguous in many ways. First, the name of a person can be written differently in various contexts due to abbreviations, misspellings or pseudonyms. Second, a name can belong to more than one person. This ambiguity problem has been addressed in different research fields using different methods such as record linkage, duplicate record detection and elimination, merge/purge, data association, database hardening, citation matching, name matching, and name authority work in library cataloguing practice (Hui et al. 2004). To address misspelling or abbreviations errors probability methods, hidden Markov models and Support Vector Machines (SVM) have been suggested
2. Context and weighting of social interactions:
It is widely accepted that different social interactions can yield to different social networks and a different network structure leads to different effects on the involved individuals (Burt et al. 1985). Therefore a weighting of the relations by the means of the different suggested types of information sources and the resulting connection types is suggested by Kautz, Selman & Shah (1997). Additionally to the mentioned direct and indirect types of connection that may constitute a social network, these connections may have different meanings in terms of interpersonal relations. Friends, colleagues, family members, team mates or participants are only some relation types a connection between actors may exhibit. The Friend-of-a-Friend (FOAF) protocol offers more than 30 kinds of relationships4, which can be assigned to a connection. These different kinds of relationships lead to different social networks of an actor. A person may be central in the social network of a research community while he is not in the local community. Such overlapping social networks have been studied in SNA. Simmel was the first who discussed the theoretical implications of a persons´ various social networks (which he called social circles). He argued that the different social networks (circles) are fundamental in defining a social identity. Recent research on large-scale, complex networks has shown that overlaps are significant. Furthermore the overlap of different sub networks within a network is a relevant network property.
A TAXONOMY FOR SOCIAL INFORMATION EXTRACTION AND RETRIEVAL
Researchers are involved in study of various social media services to find new information by analyzing the contents of social media. Social network user produces variety of content on various social platforms, so the processing of extracted contents is very crucial task. Vianna et al. 23 developed a tool that extracts personal data from heterogeneous sources and put them into Mongo DB (a No SQL database) for re-use. They further analyzed Facebook and Twitter based data and concludes that the relationship among different services represents rich source of knowledge for personalized search tool. Habib and Keulen 25 described various challenges like short, noisy and ambiguous nature of tweets content that creates difficulty to analyze tweet contents. They proposed a new model by adding noise text filter to the traditional information extraction (IE) framework.
Seo et al. 27 studied personal information exposed by user on Internet due to unawareness; this activity may harm users financially or personally. They developed a system for personal (private data) information retrieval from Web using Google PageRank 28 equation and remove that sensitive content from internet in order to maintain user privacy. Han et el. 19 retrieve personal information from multiple social networks using friend feed dataset 20.
Now we have studied various different aspect of information representation. To handle this various type of data representation and no of challenges, the process of Social IR consist of two phases:
Data Collection from various data collection algorithm
Information Retrieval from collected data
DATA COLLECTION – There are numerous search platform through which we can extract the data from the social network. The various platform is shown as:
INFORMATION RETREVIAL- After extracting the data the data set is formed to find the desired information in a precise format.
CONCLUSION AND FUTURE WORK
Today social networks play vital role in information broadcasting. But the management of such information at user’s end is very tedious task because of platform heterogeneity, fragmented nature of user contents, language barrier, security issues etc. In this paper, we performed a survey of various types of currently used social networks and also the types of information shared by users on that platform and categorized content based information into textual content based and visual content based information. Specifically, we outlined a procedure for content based information extraction, management and retrieval.