STING : A Statistical Information Grid Approach to Spatial Data Mining | CSCI 6905 | Papers Computer Science

STING : A Statistical Information Grid Approach to Spatial Data

Mining

Wei Wang

Department of Computer Science

University of California, Los

Angeles

CA 90095, U.S.A.

weiwang@cs.ucla.edu

Jiong Yang

Department of Computer Science

University of California, Los

Angeles

CA 90095, U.S.A.

jyang@cs.ucla.edu

Richard Muntz

Department of Computer Science

University of California, Los

Angeles

CA 90095, U.S.A.

muntz@cs.ucla.edu

Abstract 1 Introduction

Spatial data mining, i.e., discovery of interesting

characteristics and patterns that may implicitly

exist in spatial databases, is a challenging task

due to the huge amounts of spatial data and to the

new conceptual nature of the problems which

must account for spatial distance. Clustering and

region oriented queries are common problems in

this domain. Several approaches have been

presented in recent years, all of which require at

least one scan of all individual objects (points).

Consequently, the computational complexity is at

least linearly proportional to the number of

objects to answer each query. In this paper, we

propose a hierarchical statistical information grid

based approach for spatial data mining to reduce

the cost further. The idea is to capture statistical

information associated with spatial cells in such a

manner that whole classes of queries and

clustering problems can be answered without

recourse to the individual objects. In theory, and

confirmed by empirical studies, this approach

outperforms the best previous method by at least

an order of magnitude, especially when the data

set is very large.

Permission to copy without fee all or part of this material is

granted provided that the copies are not made or distributed

for direct commercial advantage, the VLDB copyright notice

and the title of the publication and its date appear, and notice

is given that copying is by permission of the Very Large Data

Base Endowment. To copy otherwise, or to republish, requires

a fee and/or special permission from the Endowment.

Proceedings of the 23rd VLDB Conference

Athens, Greece, 1997

In general, spatial data mining, or knowledge discovery in

spatial databases, is the extraction of implicit knowledge,

spatial relations and discovery of interesting

characteristics and patterns that are not explicitly

represented in the databases. These techniques can play an

important role in understanding spatial data and in

capturing intrinsic relationships between spatial and

nonspatial data. Moreover, such discovered relationships

can be used to present data in a concise manner and to

reorganize spatial databases to accommodate data

semantics and achieve high performance. Spatial data

mining has wide applications in many fields, including

GIS systems, image database exploration, medical

imaging, etc.[Che97, Fay96a, Fay96b, Kop96a, Kop96b]

The amount of spatial data obtained from satellite,

medical imagery and other sources has been growing

tremendously in recent years. A crucial challenge in

spatial data mining is the efficiency of spatial data mining

algorithms due to the often huge amount of spatial data

and the complexity of spatial data types and spatial

accessing methods. In this paper, we introduce a new

STatistical INformation Grid-based method (STING) to

efficiently process many common “region oriented”

queries on a set of points. Region oriented queries are

defined later more precisely but informally, they ask for

the selection of regions satisfying certain conditions on

density, total area, etc. This paper is organized as follows.

We first discuss related work in Section 2. We propose

our statistical information grid hierarchical structure and

discuss the query types it can support in Sections 3 and 4,

respectively. The general algorithm as well as a detailed

example of processing a query are given in Section 5. We

analyze the complexity of our algorithm in Section 6. In

Section 7, we analyze the quality of STING’s result and

propose a sufficient condition under which STING is

guaranteed to return the correct result. Limiting Behavior

of STING is in Section 8 and, in Section 9, we analyze the

STING : A Statistical Information Grid Approach to Spatial Data Mining | CSCI 6905, Papers of Computer Science

Related documents

Partial preview of the text

Download STING : A Statistical Information Grid Approach to Spatial Data Mining | CSCI 6905 and more Papers Computer Science in PDF only on Docsity!