Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Web Spam and Propaganda: Techniques and Countermeasures in Cyberspace, Study notes of Computer Science

The issues of web spam and propaganda in the context of the vast and omnipresent web. It discusses how anyone can be an author, the unreliability of search engines, and the techniques used by spammers and propagandists to manipulate search results and societal trust. The document also introduces the history of search engines and their evolution in dealing with spam.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-94a-1
koofers-user-94a-1 🇺🇸

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1
Trust and Propaganda!
in Cyberspace!
Panagiotis Takis Metaxas
Computer Science Department
Wellesley College
Have you used the Web…!
! to get informed?
! to help you make decisions?
Financial
Medical
Political
Religious
Other?…
! on your computer?
Your cell phone?
Your PDA?
Your thermostat?
Your toaster?
! The Web is huge
> 50 billion (! ?)
static pages publicly available,
…growing every day
Much larger,
if you count the “deep web
Infinite,
if you count pages created
on-the-fly
! The Web is omnipresent
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Understanding Web Spam and Propaganda: Techniques and Countermeasures in Cyberspace and more Study notes Computer Science in PDF only on Docsity!

Trust and Propaganda!

in Cyberspace!

Panagiotis Takis Metaxas

Computer Science Department

Wellesley College

Have you used the Web…!

to get informed?

to help you make decisions?

 Financial

 Medical

 Political

 Religious

 Other?…

on your computer?

 Your cell phone?

 Your PDA?

 Your thermostat?

 Your toaster?

The Web is huge

 > 50 billion (! ?)

static pages publicly available,

 …growing every day

 Much larger ,

if you count the “ deep web ”

 Infinite ,

if you count pages created

on-the-fly

The Web is omnipresent

… but it can be unreliable!

Anyone can be an author on the web!

Email Spam anyone?!

50% of emails received at Wellesley College are spam!

… you like it or not!!

But Google is usually so good in finding info…

Why does it do that?

Why?! How do they do it? Web Spam:

 Attempt to modify the web (its structure and contents),

and thus influence search engine results

in ways beneficial to web spammers

The Web is a Graph!

Directed Graph of Nodes and Arcs (directed edges)

 Nodes = web pages

 Arcs = hyperlinks from a page to another

A graph can be explored

A graph can be indexed

Access method Server and domain

URL

Path Document

http://www.wellesley.edu/CS/pmetaxas/index.html

crawl the

web

create

inverted index

Inverted index Search engine servers Document IDs How Google (and the other search engines) Work! user query THE WEB

Rank

results

1st Generation: How to Spam! “Keyword stuffing”: Add keywords, text, to increase content similarity Searching for Jennifer Aniston? SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANNISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANNISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANNISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANNISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK 2nd Generation: Add Popularity! A hyperlink from a page in site A to some page in site B is considered a popularity vote from site A to site B Rank similar documents according to popularity How To Spam?

www.aa.com

www.bb.com

www.cc.com

1 www.dd.com

www.zz.com

2nd Generation: How to Spam! Create “Link Farms”: Heavily interconnected sites spam popularity 3rd Generation: Add Reputation! The reputation “PageRank” of a page Pi = the sum of a fraction of the reputations of all pages Pj that point to Pi Idea similar to academic co-citations Beautiful Math behind it

 PR = principal eigenvector

of the web’s link matrix

 PR equivalent to the chance

of randomly surfing to the page

How To Spam?

Unanswered Spam Attacks!

Business weapons

 “more evil than satan”

Political weapon in pre-election season

 “miserable failure”

 “waffles”

 “Clay Shaw” (+ 50 Republicans)

Misinformation

 Promote hGH

 Discredit AD/HD research

Activism / online protest

 “Egypt”

 “Jew”

Other uses we do not know?

 “views expressed by the sites in your results are not in any way

endorsed by Google…”

Search Engines vs Web Spam!

Search Engine’s Action

1st Generation: Similarity

 Content

2nd Generation: + Popularity

 Content + Structure

3rd Generation: + Reputation

 Content + Structure + Value

In the Works

 Ranking based on

“the need behind the query”

Web Spammers Reaction

Add keywords so as

to increase content similarity

+ Create “ link farms ” of

heavily interconnected sites

+ Organize “ mutual admiration

societies ” of irrelevant

reputable sites

Is there a pattern on how to spam? Can you guess what they will do?

And Now For Something Completely(?) Different!

Propaganda:

 Attempt to modify human behavior,

and thus influence people’s actions

in ways beneficial to propagandists

Theory of Propaganda

 Developed by the Institute for Propaganda Analysis 1938-

Propagandistic Techniques (and ways of detecting propaganda)

 Word games - associate good/bad concept with social entity

 Glittering Generalities — Name Calling

 Transfer - use special priviledges (e.g., office) to breach trust

 Testimonial - famous non-experts’ claims

 Plain Folk - people like us think this way

 Bandwagon - everybody’s doing it, jump on the wagon

 Card Stacking - use of bad logic

Societal Trust is (also) a Graph! Weighted Directed Graph of Nodes and Weighted Arcs

 Nodes = Societal Entities (People, Ideas, …)

 Arcs = Trust recommendation from an entity to another

 Arc weight = Degree of entrustment

Then what is Propaganda?

 Attempt to modify the Societal Trust Graph

in ways beneficial to propagandist

How to modify the Trust Graph?

Cognitive Hacking and Cyber Trust!

Pump & Dump stock schemes

The Emulex case

Word Games

Transfer

Gaining Access or Breaking into a computer system

for the purpose of modifying certain behaviors of a human user

in a way that violates the integrity of the overall system

Does not necessarily aim to fool a search engine

Famous examples:

How (not) To Solve The Problem!

Living in Cyberspace! Critical Thinking, Education

 Realize how do we know what we know

 “Of course it’s true; I saw it on the Internet!”

Cyber-social Structures that mimic Societal ones

 Know why to trust or distrust

 Who do you trust on a particular subject?

A Search Engine per Browser

 Easier to fool one search engine than to fool millions of readers

 Enable the reader to keep track of her trust network

 Tools of cyber trust

 How would you avoid the Emulex hoax?

Thank You!! PMetaxas@wellesley.edu