Handbook of Information Security

Search Engines (security, privacy and ethical issues)

Raymond F. Wisman

 

ABSTRACT - Search engines open the front door of the Web, serving as the tool most often used when people look for any type of information. For online companies, a high rating by a search engine can bring in customers while a low rating can render the site invisible. Search engine attention is therefore commercially valuable and often subjected to attempts at manipulation. As a well-known and highly centralized part of the Web infrastructure, search engines present an attractive target to direct attacks by those seeking to disrupt.

 

The chapter gives an overview of search engine design while examining security and common attacks on search engines; privacy concerns of search engine users and about information made public; and ethical issues concerning how sites attempt to manipulate search engines and how the information is used. The literature reviewed includes sources on:

·        text searching [Salton, 1971]

·        current search engine design [Schwartz, 1998]

·        security vulnerabilities [Massimo, 1997], [Microsoft, 2000]

·        site positioning techniques [Marckini, 2001]

·        common browser vulnerabilities [CERT® Advisory CA-2003-22, 2003]

Additional research is needed on privacy and ethics on the part of search engine users.

 

OUTLINE

 

1 INTRODUCTION

1-1 SECURITY

1-2 PRIVACY

1-3 ETHICAL ISSUES

 

2 WHAT IS SEARCHED

2-1 Types of Search

2-2 Information Sources

2-2-1 List

2-2-2 Search Engine 

2-2-3 Portal 

 

3 HOW SEARCH WORKS

3-1 Human Organized Lists

3-2 Search Engines

3-2-1 Indexing

3-2-2 Retrieval

3-3 What Search Engines Search

3-3-1 Content

3-2-2 Tags

3-4 What Search Engines Ignore

 

4 SEARCH ENGINE SECURITY

4-1 External Search

4-2 Local Search

4-2-1 Reasons for Your Own Search Engine

4-2-2 Vulnerabilities

 

5 BROWSER SECURITY

5-1 Common Vulnerabilities

 

6 PRIVACY

6-1 Personal Information

6-2 Exploitation

 

7 ETHICS

7-1 Search Engine Positioning

7-2 Paid placement

7-3 Personal Information

7-4 Intellectual property

 

8 CONCLUSION

 

References

 

 

Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., and Raghavan, S. Searching the Web. ACM Transactions on Internet Technology, Vol.1, No. 1, August 2001, Pages 2-43.

 

Byers, S., Rubin, A., Kormann, D., Defending Against an Internet based Attack on the Physical World, WPES’02, November 21, 2002, Washington, DC, USA. Pages 11-18.  Copyright 2002 ACM

 

Chakrabarti, S. Recent results in automatic Web resource discovery. ACM Computing Surveys, Vol. 31, Number 4es, December 1999

 

CERT® Advisory CA-2003-22 Multiple Vulnerabilities in Microsoft Internet Explorer

Original issue date: August 26, 2003
Last revised:
October 6, 2003
Source: CERT/CC http://www.cert.org/advisories/CA-2003-22.html

 

CERT/CC Vulnerability Note VU#612843 ‘Sun iPlanet and ONE Web Servers contain a buffer overflow in the search engine’ http://www.kb.cert.org/vuls/id/612843

 

CERT/CC Vulnerability Note VU#146704iWeb Systems Hyperseek search engine may allow malformed URL requests to access files outside the document root of a vulnerable system’ http://www.kb.cert.org/vuls/id/146704

 

CERT/CC Understanding Malicious Content Mitigation For Web Developers http://www.cert.org/tech_tips/malicious_code_mitigation.html

 

Gordon, M., and Pathak, P. 1999. Finding information on the world wide web: the retrieval effectiveness of search engines. Information Processing and Management, 25(2):141-180, 1999.

 

Julio César  Hernández, José María  Sierra, Arturo  Ribagorda, Benjamín  Ramos. Search Engines as a Security Threat. IEEE Computer. October 2001 (Vol. 34, No. 10), pp. 25-30

 

Lawrence, S., and Giles, L. Accessibility of Information on the Web. Nature (400:107-109) 1999.

 

Marckini, F. Search Engine Positioning Wordware Publishing, 2001, ISBN 1-55622-804-X

 

Massimo Marchiori. Security of World Wide Web search engines. IFIP TC5 WG5.4 3rd international conference on Reliability, quality and safety of software-intensive systems. Athens, Greece Pages: 161 - 174   Chapman & Hall, Ltd.   London, UK: 1997 ISBN: 0-412-80280-5

 

Microsoft Security Bulletin (MS00-078) Patch Available for 'Web Server Folder Traversal' Vulnerability Originally posted: October 17, 2000 http://www.microsoft.com/technet/security/bulletin/ms00-078.asp

 

Peltonen, K. ‘Adding Full Text Indexing to the Operating System’ Proceedings of the Thirteenth International Conference on Data Engineering; Pages: 386 - 390 Year of Publication: 1997

 

Robots Exclusion Protocol. 2000. Robots Exclusion Protocol. http://info.webcrawler.com/mak/projects/robots/exclusion.html.

 

Salton, G., and Buckley, C. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 24, 5.513-523. 1988.

 

Salton, G., Wong, A., and Yang, C. S. A Vector Space Model for Automatic Text Retrieval. Communications of the ACM, 18 11.613-620, 1971.

 

SANS Top 20 Internet Security Vulnerabilities. http://www.sans.org/top20

 

Schwartz, C., 1998. Web Search Engines. Journal of the American Society for Information Science, 49(11):973-982, 1998.

 

Search Engine Watch. http://www.searchenginewatch.com.