Previous Page
Next Page

Hack 98. Pull the HTML Source Code from a Web Site

Integrate web data into your application.

"Use a Browser Inside Access" [Hack #97] shows you how to use the Microsoft Web Browser control to display a web page. This hack takes that functionality a step further and shows how to get to the source code. Being able to access the source code makes it possible to extract data from a web site.

Figure 10-8 shows a web site being displayed in the browser control, and a message box displays the site's HTML.

Figure 10-8. Reading the HTML source from a web site


The Microsoft Web Browser control has an extensive programmatic model. Visit http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/prog_browser_node_entry.asp for more information.


The HTML is returned with this line of code:

   MsgBox Me.WebBrowser1.Document.documentElement.innerhtml

The programmatic model for the web browser control follows the document object model (DOM). As the browser displays a web site, documentElement and its child nodes become available. In this example, the full HTML is accessed with the innerhtml property. Because the HTML is accessible, you can pass it to any routine you want. For example, you can have a routine that looks for HTML tables from which to pull data or that searches through the HTML for keywords, and so on.

    Previous Page
    Next Page