Progressive Content-Based Retrieval from Satellite Image Archives

Lawrence D. Bergman, Vittorio Castelli, and Chung-Sheng Li
Networked Data Systems Department
IBM T. J. Watson Research Center
Yorktown Heights, New York

{bergman,vittorio,csli}@watson.ibm.com

D-Lib Magazine, October 1997

ISSN 1082-9873

Searching Scientific Images

State of the Art

Search Strategy

Progressive Framework

Searching Scientific Images

Over the last forty years, satellite imagery has been used for a wide variety of applications. Automatically locating and recognizing objects such as missiles, tanks, and ships from images taken by spy satellites and reconnaissance aircraft dates back to the late 50's and early 60's. Civilian and scientific use of the satellites began to emerge during late 60's and early 70's in a variety of space and earth science programs. Recently, there has been a significant increase in application areas for satellite imagery. For example, Landsat images have been used in global change studies, such as the analysis of deforestation in the Amazon River Basin over the past three decades. Other satellites (such as ERS-1 and Radarsat) equipped with Synthetic Aperture Radar (SAR) instruments have been used to determine the ice coverage of the Arctic and Antarctic Circles to monitor global warming. Weather satellites, such as Geostationary Operational Environmental Satellites (GOES), have been tracking the movement of hurricanes and storms. In the coming months, several new satellites with much better spatial and spectral resolution will be launched into orbit. Their potential applications in the area of agriculture, local government, environmental monitoring, utilities and transportation, disaster management and real estate are almost unlimited.

As the volume of satellite imagery continues to grow exponentially, the efficient management of large collections of such data is becoming a serious challenge. In particular, the successful deployment of automated repositories depends crucially on efficient indexing and retrieval of the desired portion of an image using a simultaneous combination of low-level image features, semantics, and metadata. This is exemplified in the following emergency management scenario, which involves satellite imagery. Upon detection of a forest fire, (either through satellite imagery or human observation), a fire manager from the forest service identifies bodies of waters in the proximity of the fire, from which helicopters or planes can draw water to contain the flames. The same manager assesses the threat to neighboring towns, power lines, and other natural or artificial structures, using a fire model that relies on the most recent weather information, fuel status (that is, type and condition of the forest, soil moisture etc.), and terrain data (usually in the form of digital elevation models). The fire manager then chooses the appropriate course of action. Here, efficient indexing mechanisms are required to locate fires, water, escape routes for field personnel, and values at risk (such as houses) from satellite images, as well as additional information such as the fuel types, the fuel status and the soil moisture.

In another scenario, a global change study analyzes the sea ice of the Arctic region. Different types of sea ice are identified by means of texture features extracted from SAR images. Here, efficient feature-space indexing is used to search through 10 years worth of imagery to analyze the changes in the distribution of particular classes of sea ice.

Other forms of scientific data, including time series, medical images (such as MRI, CT, and PET), and seismic data, have archiving and search requirements similar to those for satellite imagery.

In this paper, we describe an architecture and initial implementation of a progressive framework for storage and retrieval of image and video data from a collection of heterogeneous archives. The extensibility of this framework is achieved by allowing search constraints to be specified at one or more abstraction levels. The scalability of the proposed architecture is achieved by adopting a progressive framework, in which a hierarchical scheme is used to decorrelate and reorganize the information contained in the images at all of the abstraction levels. Consequently, the search operators can be efficiently applied on a small portions of the data and use to progressively refine the search results. This technique achieves a significant speedup as compared to more conventional implementations.

State of the Art

Several image and video database systems supporting content-based queries have been recently developed. These systems allow image or video indexing through the use of low-level image features such as shape, color histogram, and texture. Prominent examples for photographic images include the MIT PhotoBook [1], IBM QBIC [23], VisualSeek and SaFe from Columbia University [2], and the Multimedia Datablade from Infomix/Virage [3 ]. Content-based search techniques have also been applied to medical images [4, 21], art work[20] and video clips [15, 5, 6, 7, 17]. Despite the tremendous progress in this area, there still exist a number of outstanding issues. In particular, we identify the need for: (1) A replicable data model which is applicable to both photographic images and domain specific images such as satellite images, medical images, stock sequence, and seismic data. (2) An extensible framework which allows content-based queries using pre-extracted low-level features as well as user-defined features and semantics; (3) A storage and retrieval system that is scalable with both the amount of data and the number of users; (4) A query interface which is both intuitive and expressive to specify constraints for multimedia objects. The most challenging task is to establish a framework which is both extensible and scalable. Existing systems can be scalable if the retrieval is based only on the pre-extracted features. However, these approaches do not usefully extend across different image types and domains.

Search Strategies

Content-based retrieval of image and video databases usually involves comparing a query object with the objects stored in the data repository. The search is usually based on similarity rather than on exact match, and the retrieved results are then ranked according to a similarity index. Objects can be extracted from an image at ingestion time, composed at query time, or retrieved using a combination of the above strategies. In most cases, compromises have to be made between generality and efficiency. Objects extracted at image ingestion time can be indexed much more efficiently. However, it is usually very difficult to anticipate all the types of objects in which a user might be, and thus systems allowing only search based on pre-extracted objects are severely limiting. On the other hand, recognizing objects entirely at query time will limit the scalability of a system, due to the high expense of such computing.

To alleviate both problems, we propose a object-oriented framework which allows flexible composition of queries relying on both types of objects. Within this frameworks, objects can be specified at multiple abstraction levels (Figure 1), namely, the raw data level, the feature level, and the semantic level:

Raw Data: At the lowest abstraction level, objects are simply aggregations of raw pixels from the image. Comparison between objects or regions is done pixel-by-pixel. Commonly used similarity measures include the correlation coefficient and the Euclidean distance. Comparison at the pixel level is very specific, and is therefore only used when a relatively precise match is required.
Feature: The next higher abstraction level for representing images is the feature level. An image feature is a distinguishing primitive characteristics or attribute of an image [8]. Some features such as luminance, shape descriptor, and gray scale texture are natural since they correspond to visual appearance of an image. Other features such as amplitude histogram, color histogram, and spatial frequency spectra lack a natural correspondence. Different features are often grouped into feature vectors. Images in an archive can be segmented into regions characterized by homogeneous feature vectors. Similarity search in the n-dimensional feature space thus consists of comparing the target feature vector with the feature vectors stored in the database. An object-oriented definition of a feature object involves prescribing a set of pertinent features as well as a method (such as a clustering algorithm with the appropriate parameters) which characterizes the homogeneity of the object. Feature objects can be predefined and pre-extracted, user-defined and constructed at query time using pre-extracted features, or even user-defined and extracted at query time. Various spatial indexing schemes such as R-Trees can be used to facilitate feature space indexing.
Semantic: This is the highest abstraction level at which a content-based search can be performed. An object-oriented definition of a semantic object also involves prescribing a set of pertinent features or pixels as well as a method (such as a classification algorithm with the appropriate training data). For satellite images, examples of semantic objects include the type of land cover for a specific area such as water, forest, or urban. For medical images, examples include the type of organ such as liver, stomach, or colon. A semantic network can be constructed which groups similar semantic terms into a categories. For example, pine and maple are grouped into trees, rose and sunflower are grouped into flowers, corn and wheat are grouped into crops, etc. The purpose of constructing such a semantic network is to allow the generalization of retrieval at the semantic level.

Figure 1: Abstraction levels of an image.

We distinguish between simple (or atomic) objects and composite objects. The definition of a simple object reflects the three levels of abstraction defined above: a simple object is a connected region

of raw pixels, or
where selected features are homogeneous (e.g., texture feature), or
with homogeneous semantics (e.g. forest, urban, water).

We also allow additional constraints to be imposed on the attributes of an simple object such as the size or location, and on the features that are not used in the object definition.

A composite object consists of a set of simple objects and a set of spatial or temporal constraints. For example, given the simple objects , , and , then the composite object , shown in Figure 2, consists of and of the three spatial rules:

is to the northeast of ,
is to the northwest of ,
is to the west of .

Figure 2: Example of a composite object.

An example of a composite object is a bridge, which consists of a segment of road with water on both sides. Similarly, airports can be defined as regions containing parallel or perpendicular intersecting runways.

Progressive Framework

In order to maintain extensibility and flexibility in a content-based image search facility, semantics and features must often be extracted from images in the archive at query time. These are computationally expensive operations, however, causing difficulty in designing a scalable archive search scheme. To address this issue, we propose a progressive framework which involves reorganizing the data representation and designing search schedules.

The data in the archive is transformed into a progressive representation with multiple resolutions. Although a wavelet-based transformation is adopted in the current system, this principle can be applied to other transformations such as the Discrete Cosine Transformation. The purpose of data reorganization is to concentrate the information contained in the original image data. Consequently, search algorithms such as template matching, histogram computation, texture matching, and classification can be adapted to search the archive hierarchically, resulting in a significant reduction in the amount of data that needs to be processed.

We schedule query processing based on the estimated ability of each search operator to restrict the search space. Our initial implementation uses a simple heuristic -- we estimate the number of objects to be returned from each operator in a query, and at each stage invoke the operator that returns the smallest number. We are currently exploring more sophisticated scheduling schemes, including one that incorporates fuzzy logic.

Prototype System

The architecture of the initial implementation of the proposed progressive framework, shown in Figure 3, consists of Java clients, an HTTP server, the search engine, a database management system (IBM DB2), an index and image archive, and a library contains various feature extraction, template matching, clustering and classification modules.

Figure 3: Architecture of the system.

In this paper, we assume that a database consists of those relational or object-relational tables that are semantically related.

The database schema of the proposed system is shown in Figure 4 and consists of the following:

Domain table. The domain table contains an entry for each domain in the database. Examples of domains include: satellite imagery, MRI data, ultrasound data, or seismic data.
Image table. The image table contains an entry for each image in the database. For satellite imagery, attributes include the acquisition date and time, the latitude/longitude of the corners, the ownership, a description of the instrument used to acquire the image, a pointer to the files containing the image data, and additional comments. In our system, the pointer is a UNIX file name. In general, the pointer can be a URL, allowing the system to access image servers located anywhere on the Internet. The consistency and integrity of the data, however, are compromised when the image data is not stored in the local database. In other words, an obsolete entry can exist in the table after an image is removed. In order to resolve this problem, additional mechanisms for synchronizing the entries in the database and the actual image files are required.
Object table. The object table contains descriptions of semantic objects that have been pre-extracted from the images. Each object is defined through a set of features extracted from the image and a classification scheme that assigns semantics to regions of an image. As a result, different kinds of objects can be extracted from the same region of an image using different feature sets. The attributes of the object table include the object class (or object name), the centroid or the bounding box of the object, a confidence level (if the classifier is based on a statistical model), or a membership function value (if the classifier is based on fuzzy set theory), and the metadata used for extracting the objects.
Feature tables. Each distinct feature set has its own feature table. Each region which is homogeneous with respect to the specific feature set has an entry in the feature table corresponding to the feature set. The attributes of the feature table include the image identifier, some spatial location information (e.g., the centroid or the bounding box of the region), and information on the feature values (for instance the centroid or the bounding hypercube of the feature vectors associated with each point of the region).

Figure 4: Schema of the database.

Queries are constructed and parsed syntactically at the client using a drag-and-drop interface. This interface also allows the definition of new objects and features. After being specified and syntactically analyzed, the query is then sent to the server. A query usually involves both metadata search and image content search. The server subsession parses the query string into a script consisting of a set of SQL statements and content-based search operators. Search on the metadata is performed by the database engine, while the image content search is performed by the content search operators. Usually the metadata search serves as a pruning mechanism, to achieve maximum reduction of the search space during the first stages of the query execution process. The content-based search engine can also access the feature descriptors as well as the raw pixels of an image region in order to compute the final results.

Search results are usually ranked based on a similarity measure. The user can specify the maximum number K of results to be rendered; the system ranks the results of the search process and returns the top K ones.

The results of the search are rendered by the visualization engine. If necessary (i.e., if database entries have been used for search but not the raw imagery), images associated with the search results are retrieved and rendered appropriately. The system supports both monochrome and RGB composite renderings of images. Object descriptions are packaged with each image for display by the client, allowing for interactive display and manipulation of query results.

Query Examples

In this section we present several scenarios of content-based query at multiple abstraction levels and of query refinement using our system. The first scenario involves search on semantic-level objects. For this example and the two following, we are searching on a single band (band 1) of a Landsat Multi-spectral Scanner (MSS) image of a portion of southern Montana. The query for the first scenario, composed using our Java-based drag-and-drop interface, is shown in Figure 5;

Figure 5: Semantic query composed using drag-and-drop interface

The query is translated into a set of nested function calls by the client and transmitted to the server where it parsed into a script consisting of both database calls and search operators. Objects that meet the search criteria are packaged with their associated images for transmittal to the client.

The results from this sample query, which searches based on predefined, semantic objects (agricultural areas, water bodies) and predefined relationships, is shown in Figure 6; agricultural areas that lie within 2 kilometers of water bodies are displayed by the client as red boxes.

Figure 6: Query results from the example query
"Find agricultural areas that are within 2 kilometers of bodies of water"

The second example demonstrates feature-based search. A region of the image has been specified via a rubber band box (the selected region is shown as a white rectangle in Figure 7) and supplied as an example for a texture-matching operator. The specification of the match is shown in Figure 8

Figure 7: An example-based specification of a feature-based search operation

Figure 7 shows the results of the simple query "Find all mixed_usage within the currently defined search region". Feature-based objects are stored as square regions of fixed size (in this case 32x32 pixels), which have been pre-extracted from the image. The results represents the best matches to the user-specified example.

Figure 8: Results of an example-based texture-match query

Figure 9 shows how the results of the previous query can be used to refine the specification. Several of the regions from the query were selected and used to modify the object specification. Submitting this new specification to the search engine will result in a modified set of results.

Figure 9: Refined definition of an example-based object using previous query results

The next example demonstrates construction of a composite object definition. The definition, shown in Figure 10 is comprised of two objects, one of which is a semantic object, the other the feature-based object constructed in the last example. Two constraints are specified that define the object, one a distance-based constraint, the other a directional constraint. Figure 11 shows the single result object which satisfied all constraints for the given region. The displayed bounding rectangle encompasses all components that comprise the composite.

Figure 10: Definition of a composite object

Figure 11: Results from a composite object query

The final example demonstrates the use of a pixel-based matching operator. Figure 12 shows the object definition based on an image example and the "Find" query that that is submitted to the search engine. Note that in this example we are searching on a set of images (specified by the "all images in instrument") clause, rather than a single image. Figure 13 shows the set of results returned from this query. Note that a number of images were returned with the best match displayed for each.

Figure 12: Definition of a pixel-level object and a query using this definition

Figure 13: Results from a pixel-level object query

Summary

In this paper, we have presented a scalable architecture to perform progressive content-based search from large satellite image archives. Objects at both the semantic and the feature levels are extracted to facilitate efficient and flexible retrieval. In addition, a hierarchical representation is constructed for each image. Consequently, content-based search operations can be applied hierarchically, resulting in a significant speedup.

Acknowledgements

This project is supported in part by NASA grant CAN NCC5-101. The authors would like to acknowledge Dr. Harold S. Stone for making the initial proposal, and Dr. John J. Turek for leading the project between January 1995 and June 1996.

References

1: A. Pentland, R. W. Picard, S. Sclaroff, ``Photobook: Tools for Content-Based Manipulation of Image Databases,'' Proc. SPIE, vol. 2185, Storage and Retrieval for Image and Video Databases, Feb. 1994, pp.34-47.
2: J. R. Smith and S.-F. Chang, ``VisualSeek: A fully automated content-based image query system,'' Proc. International Conference on Image Processing, Lausanne, Switzerland, 1996.
3: J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, and R. Jain, ``The Virage Image Search Engine: An Open framework for Image Image Management,'' Proc. SPIE, Storage and Retrieval for Still Image and Video Databases, vol. 2670, pp. 76-87, 1996.
4: T. Y. Hou, P. Liu, A. Hsu, M. Y. Chiu, ``Medical Image Retrieval by Spatial Feature,'' Proc. IEEE International Conference on System, Man, and Cybernetics, 1992, pp. 1364-1369.
5: H. Zhang and S. W. Smoliar, ``Developing Power Tools for Video Indexing and Retrieval,'' Proc. SPIE, vol. 2185, Storage and Retrieval for Image and Video Databases, Feb. 1994, pp. 140-149.
6: S. W. Smoliar and H. Zhang, ``Content based Video Indexing and Retrieval,'' IEEE Multimedia, vol. 1, no. 2, Summer, 1994, pp. 62-72.
7: F. Arman, A. Hsu, M. Y. Chiu, ``Image Processing on Compressed Data for Large Video Database,'' Proc. ACM Multimedia 93, pp. 267-272.
8: W. Pratt, ``Digital Image Processing,'' second edition, John Wiley, New York, 1991.
9: Y.-H. Liu, P. Dantzig, C. E. Wu, J. Challenger, and L. M. Ni, ``A distributed web server and its performance analysis on multiple platforms,'' Proc. ICPADS, pp. 665-672, Hong Kong, 1996.
10: J. R. Smith and S.-F. Chang, ``Quadtree segmentation for texture-based image query,'' Proc. ACM Multimedia, San Francisco, 1994.
11: L. Bergman and L. Knapp, ``Drag and drop English - An Interface for Creating Structured Natural Language Database Queries,'' submitted to SIGCHI'97 Annual Conference.
12: A. R. Rao, ``A Taxonomy for Texture Description and Identification,'' Springer-Verlag, New York, 1990.
13: Z. Q. Gu, C. N. Duncan, E. Renshaw, M. A. Mugglestone, C. F. N. Cowan, and P. M. Grant, ``Comparison of Techniques for Measuring Cloud Texture in Remotely Sensed Satellite Meteorological Image Data,'' IEE Proceedings, vol. 136, no. 5, Oct. 1989, pp. 236-248.
14: J. Weszka, C. Dyer, and A. Rosenfeld, ``A Comparative Study of Texture Measures for Terrain Classifications,'' IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-6, no. 4, pp. 269-285, 19 76.
15: E. Deardorff, T. D. C. Little, J. D. Marshall, and D. Venkatesh, ``Video Scene Decomposition with the Motion Picture Parser,'' Proc. SPIE, vol. 2187, Digital Video Compression on Personal Computers: Algorithms and Technologies, Feb. 1994, pp.44-55.
16: J. C. Tilton and M. Manohar, ``Earth Science Data Compression Issues and Activities,'' Remote Sensing Reviews, vol. 9, 1994, pp. 271-298.
17: F. Arman, A. Hsu, M. Y. Chiu, ``Feature Management for Large Video Database,'' Proc. SPIE, vol. 1908, 1993, pp. 2-12.
18: J. Bertin, Semiology of Graphics, 1967, reprinted by University of Wisconsin Press, Madison, WI, 1983.
19: D. DeWitt, N. Kabra, J. Luo, J. Naughton, J. Patel, and J.-B. Yu, ``Client-Server Paradise,'' Proceedings of the Very Large Data Base Conference, Santiago, Chile, Aug. 1994.
20: B. Holt and L. Hartwick, ``Visual Image Retrieval for Applications in Art and Art History'' Proc. SPIE, vol. 2185, Storage and Retrieval for Image and Video Databases, Feb. 1994, pp.70-81.
21: T. Y. Hou, A. Hsu, P. Liu, M. Y. Chiu, ``A Content Based Indexing Technique using Relative Geometry Features,'' Proc. SPIE, vol. 1662, Image Storage and Retrieval Systems, 1992, pp. 29-68.
22: A. Kitamoto, C. Zhou, M. Takagi, ``Similarity Retrieval of NOAA Satellite Images by Graph Matching,'' Proc. SPIE, vol. 1908, Storage and Retrieval for Image and Video Databases, Feb. 1993, pp. 60-73.
23: W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, G. Taubin, ``The QBIC Project: Querying Images by Content Using Color Texture, and Shape,'' Proc. SPIE, vol. 1908, Storage Retrieval for Image and Video Databases, 1993, pp. 173-187.