Public Sphere 2.0

Targeted Commenting in Online News Media

DSLAB PR @ 2019.07.11

이준범 ( jun@beomi.net)

Abstract

  • News consumption => Max(ad revenue)
  • Reader Engagement 👍 => *Comments*
  • Traditional view: Comments from the full article
  • Present Landscape:
    Comments from only particular sections of the article
  • Build Neural Net to find 'comment ~ article section'
 

1. Introduction

Background

  • Paradigm shift consuming news 📰
  • Online media > Offline media
  • Comment & Share ideas on articles
  • Comments = Most effective tool for user engagement
  • Online news acts as the facilitator of public debates
 

User patterns

  • F-shaped pattern
  • User's attention is focused on initial texts
  • News website = New public sphere
 

2. Dataset & Motiv

Dataset

  • 1352 Guardian
  • 1020 NYTimes
  • with Comments

Dataset

  • 60%+ comments in >20 paragraphs length
  • More Longer Article = More comments

Dataset

  • Label data with relevance score 1(irrelevant)~5(relevant)
  • Judged by presence and absence of
    - common words
    - common thoughts
  • 2 Annotators => Cohen Kappa 0.71

Overview

  • 42.7% = Relevent to the *whole* article
  • 48.9%/48.8% = Relevent to 2-3 paragraphs
  • More releveant comments
    => Beginning paragraph

3. Linking
Comments to Paragraphs

Approach

  • Baseline: Traditional ML
    ex) NB,DT,RF,K-NN,R-SVM,AdaBoosts,LR
  • DeepLearning Method:
    - LSTM
    - GRU 👈 Best Score!

Bi-LSTM & GRU

  • Pretrained 300-dim Google News Vectors
  • Text(article/comment) 👉 Embedding Vector
    (OOV 👉 zero vector)
  • Text Vector = avg(word vectors)
  • LSTM / GRU 👉 150 dim vector
  • Merge Article vec + Comment vec
    👉 FC layer 👉 Softmax
    👉 5 classes 🔥

Eval

  • 10-fold cross-validation
    (DL Model: val each epoch, total 5 epochs)

Related works

  • Comment Ranking
  • Comment Recommendation
  • Comment Analysis

Conclusion

  • Traditional comment UI needs revamp
  • Comments are more related to particular sections
  • DeepLearning, GRU performs better
    (Could BERT/XLNet perform better?)