abril 17, 2018

«Stance Classification through Proximity-based Community Detection»



Ophélie Fraisier, Guillaume Cabanac, Yoann Pitarch, Romaric Besançon et Mohand Boughanem (@IRIT_UMR5505)
«Stance Classification through Proximity-based Community Detection»


ACM Conference on Hypertext & Social Media (HT 2018), Baltimore, Maryland, ACM, july 2018 (en prensa).


Extracto de apartados en páginas 2 y 4 de la publicación en PDF. Véanse las referencias en la publicación original del texto.




«Polarization on social media

»Several studies showed that online social media were highly polarized in some contexts, particularly for political topics [23, 25]. They revealed the presence of “echo chambers” on several platforms [35]. This term describes a phenomenon characterised by users prefering to interact with like-minded people. Conservative and liberal blogs tend to mainly reference blogs from their own ideological camp, as shown by their linking patterns and discussion topics [2].

»Similarly, Twitter’s retweet networks concerning the 2010 and 2014 US midterms elections and the 2014 Scottish independance referendum were highly polarized between left- and right-leaning profiles, while people interacted more freely in the mention networks [14, 20]. Even on Wikipedia, controversies occur mainly in neighbourhoods of related topics [17]. This suggests that some topics tend to be particularly attractive for users promoting diverse mindsets. It is important to note however that on non-political topics, polarization is usually more nuanced [5].



»Stance detection

»Text is often the main piece of information used to determine stance. Several researchers studied debate sites and argumentative essays [22]. [36] used a topic model to discover viewpoints, topics, and opinions to classify texts on the Israeli-Palestinian conflict according to their ideological leaning. Other studies focused on less structured platforms: Twitter has, for instance, been largely used, due to its large popularity and the facility to collect data. [9] used a statistical model to determine the political stance of politicians from the Belgian Parliament on Twitter. [27] trained an SVM model including sentiments as features to detect if profiles were “for” or “against” given targets. Forums are other exploitable information silos: [39] used neural networks on a breast cancer forum to identify the profiles’ stances on complementary and alternative medicine. Alternatively, some works rely on social interactions between profiles. [4] built a bayesian model inferring ideology of profiles according to which political actors they are following. [33] identified pairs of profiles with differing opinions thanks to a retweet-based label propagation algorithm tied to a supervised classifier. [37] propose an unsupervised topic model taking into account the text and social interactions to identify viewpoints. [24] used an SVM on textual content, retweets, and mentions to predict the future attitude of profiles in the aftermath of a major event. Their results show that social features are of prime importance for this task. [38] also used a combination of text and retweets to quantify the political leaning of media outlets and prominent profiles. [16] consider users’ discussions and interactions to predict stances using few annotations on any social media. These works are the most relevant ones for our task but they are focused on Twitter datasets, or limited by the fact that they require a large number of annotations or consider two main stances at most (see Table 1). In contrast, we promote a generic approach which needs significantly less annotated data to perform well, as exposed in the following sections.





»Implications for Stance Detection

»The results of these experiments demonstrate that communities detected on social media elements can yield extremely high homogeneity in terms of stance, and therefore be an effective way to propagate stance from some known profiles. Moreover, as indicated by moderate NMI values, the communities extracted from the different considered proximities look different. This suggests that each one brings a specific piece of information about the profiles entourage, allowing for a better characterization. Unsurprisingly, reciprocal versions of the proximities are semantically close to their complete versions (we see high NMI scores between the pairs) but their higher purities may be of interest for our task.

»Even when extracted from the same platform, each dataset has its own particularities. Indeed we can see that some proximities can be useful or hurtful depending on the dataset, and that the similarity between proximities varies across datasets. On Twitter, citeall and assoall seem particularly encouraging for our task: they have very homogeneous communities and bring unique information compared to other proximities (apart from their reciprocal version). On CreateDebate, ref and assorec seem interesting for the same reasons. These measures could help us determine which proximity to discard in order to optimize our process, but for the time being we will consider all the defined proximities.»






No hay comentarios:

Publicar un comentario