![]() |
|||||||||||
|
|
|||||||||||
| SEARCH SNIPPETS | |||||||||||
|
The Online Research Gamble Raising questions and defining guidelines. By Genie Tyburski Genie Tyburski is a Web research applications specialist at Ballard Spahr Andrews & Ingersoll. |
|||||||||||
|
Already, one recent lawsuit filed against Harding Earley Follmer & Frailey, a Pennsylvania law firm, accuses the firm’s researchers of using hacking techniques to obtain information from old versions of the plaintiff company’s Web site. While some might question the merit of the claim, it creates concerns about online research techniques, such as what constitutes hacking and where researchers should draw the line when seeking nonpublic information. For example, is it ethical to browse portions of a Web site the owner blocks from search access? To prevent a search engine from indexing all or some of the pages that make up a Web site, developers created a computer text file called “robots.txt.” The file contains instructions that all major search engines voluntarily obey. It might even contain individual instructions for specific search engines. The robots.txt file at The Virtual Chase, for example, gives separate commands to the Internet Archive (e.g., ia_archiver) and the site’s own search engine (i.e., Atomz/1.0). The purpose of these commands is to block general search access to the site’s licensed, copyrighted images while enabling the proper display of earlier versions at the Internet Archive. But what a robots.txt really file tells researchers — particularly those who seek confidential information — is where to start looking. If you display the robots.txt file at The Virtual Chase, you will find three sets of instructions commanding all search engines to stay out of a subdirectory called “unc.” Consequently, visitors don’t have search access to documents stored in this part of the site. Depending on the site’s security, you might be able to browse the section’s contents by entering a URL that ends with the subdirectory name. If poor Web site security permits browse access, you would see a list of files. You then could display the contents of each file simply by following the link. But does the act constitute computer hacking? Or is it the responsibility of the Web site owner to secure documents it doesn’t want available to prying eyes? Suppose you are conducting a background investigation on an individual, and while running Google queries, you find documents containing the person’s university grades, medical information, Social Security number, date of birth or financial accounts. Is the information public because you found it via Google? Should you give it to the client or use it to expand the research? If you think it’s unlikely a search would retrieve such information, check out the Google Hacking Database at professional hacker Johnny Long’s Web site. Long is a network security expert and author of the book “Google Hacking for Penetration Testers” (Syngress Publishing, 2005). The book and Web site provide fascinating insight into nefarious uses of search engines. However, innocent queries might also retrieve sensitive information. In fact, neither the book nor the Google Hacking Database provides hacking techniques. Rather, they illustrate the use of clever and legitimate search commands. Often, the sensitive information these queries uncover resides in the Google cache — a copy of the Web page at the time the engine indexes it. Therefore, no infiltration is necessary to retrieve it. Do you want to find credit card numbers and other personal financial accounts? Or a recent bankruptcy or divorce filing? While divorce filings generally are not available on the Internet, you can find full-text bankruptcy filings through the U.S. Party/Case Index Web site. Depending on the type of bankruptcy and its status, the financial or credit accounts listed might still be active. Do you want to find a person’s driver’s license number and vehicle registration? If the information isn’t available from the state Department of Motor Vehicles, check the public status of vehicle accident reports in the relevant jurisdiction. These reports often contain drivers’ names, home addresses, phone numbers, genders and dates of birth, as well as vehicle ownership information, vehicle descriptions, vehicle identification numbers, license plate numbers and insurance carriers. Reports in some localities even are available in full-text on the Web for free or a relatively low cost. While the easy accessibility of such information online might concern consumers, the question for researchers is more complex. Few would think twice about giving clients information found in a public record. But should you? Should routine background investigation reports, for example, contain a full Social Security number and date of birth? Is how the client intends to use the information important? Or is intended use only relevant when seeking private information regulated by federal laws, such as the Gramm-Leach-Bliley Act? What if, through the use of a search engine, you found personal financial information in an unsecured spreadsheet on a Web site? Do you have an ethical obligation to protect the individual? Or would it be professionally irresponsible not to reveal what you found to your client? The time for addressing the issue of online research ethics is upon us. If we fail to develop professional guidelines, we soon could learn the answers to these questions in court.Entire contents copyright © 2005 James Publishing, Inc.
|
![]() ![]() |
||||||||||