用户名: 密码: 验证码:
Social-enriched data analysis and processing tools.
详细信息   
  • 作者:Liu ; Xingjie.
  • 学历:Ph.D.
  • 年:2013
  • 导师:Lee, Wang-Chien,eadvisor
  • 毕业院校:The Pennsylvania State University
  • ISBN:9781303451256
  • CBH:3573794
  • Country:USA
  • 语种:English
  • FileSize:4591231
  • Pages:168
文摘
In recent years, the rapid development of online social services, such as Facebook, Twitter, LinkedIn and Foursquare, poses new opportunities and challenges to researchers. On the one hand, with huge amount of comprehensive social network data and various types of user-generated contents made available for analysis, we are able to conduct in-depth studies on the scale we never had before. The data will help us better understand peoples opinions and activities, capture trends in our society and improve social services. On the other hand, however, such data require novel techniques for modeling, extraction and processing to reveal its real value, because many existing solutions cannot handle new issues such as heterogeneous data type, scalability and efficiency requirement, etc. In this thesis, we introduce the concept of social-enriched data, which is defined as the social connection graphs as well as user-created contents distributed on the graphs, to represent the data collected in the aforementioned online social services. We identified several issues in handling the social-enriched data and proposed a set of novel solutions to tackle these issues. First, as people tend to interact with others both through the online services and in their offline lives, capturing the properties of heterogeneous social networks with both online and offline components becomes critical. Hence, we investigated a new type of social network as Event-based Social NetworksEBSNs) as a typical example for the heterogeneous graph. The EBSNs contain both online social interactions as in other conventional online social networks, as well as offline social interactions captured in offline activities. Based on real data collected from Meetup, a social event organizing service, we analyzed EBSN properties and discovered many unique and interesting characteristics, such as heavy-tailed degree distributions and strong locality of social interactions. In addition, we subsequently studied the heterogeneous nature co-existence of both online and offline social interactions) of EBSNs on two challenging problems: community detection and information flow. We found that communities detected in EBSNs are more cohesive than those in other types of social networks e.g. location-based social networks). In the context of information flow, we studied the event recommendation problem and significantly improved the recommendation with a community-based diffusion model which infuses both online and offline interactions. Second, as user-created contents consist of one essential ingredient of many online social services, we chose to study it in a widely applied practice, i.e., recommendations. In particular, we focused on the problem of recommending contents for a group of users by utilizing the social context. To extract the group user preference information from the social-enriched data, we analyzed the decision making process in user groups, and proposed a personal impact topic PIT) model as a type of probabilistic generative model. The PIT model effectively identifies the group preference profile for a given group by mining the individual preferences and personal impacts of group members from group recommendation history. Further, we integrate the friends connection information to obtain an extended personal impact topic E-PIT) model. Through comprehensive data analysis and evaluations conducted on three real datasets, we demonstrate that the social based PIT and E-PIT approachs achieved good performance. Finally, to support efficient data analysis and combat the scalability issues, we proposed two data analyzing tools for social-enriched data, namely, distributed graph summary and uncertain skyline query. The distributed graph summary algorithms summarize a large scale graph into an abstract graph, where the topologies of the original graph is preserved. As online social networks can become extremely large and complex, graph summarization is crucial in uncovering useful insights about the patterns hidden in the underlying graphs. In our study, we introduce three distributed algorithms enable parallel processing of graph summarization, which produce good quality summaries and scales well with increasing data sizes. The uncertain skyline operator is a data filtering operator to identify a set of data items that are not dominated by any other items, where each item is represented as a multidimensional data tuple with probabilistic attribute values. The operator is particularly useful for multi-criteria data analysis and filtering for user created contents. Specifically, the U-Skyline query searches for a set of tuples that has the highest probability aggregated from all possible scenarios) as the skyline answer. In order to answer U-Skyline queries efficiently, we propose a series of optimization techniques for query processing. Our performance evaluation shows that our algorithm is 10 – 100 times faster than the state-of-art solutions. Social-enriched data analysis gains more and more research interests today. This thesis presents pioneer works in several challenging topics in this area, and we believe that our solutions will provide real value to the utilization of social-enriched data in practice.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700