Hack 98. Pull the HTML Source Code from a Web Site
Integrate web data into your application.
"Use a Browser Inside Access" [Hack #97] shows you how to use the Microsoft Web Browser control to display a web page. This hack takes that functionality a step further and shows how to get to the source code. Being able to access the source code makes it possible to extract data from a web site.
Figure 10-8 shows a web site being displayed in the browser control, and a message box displays the site's HTML.
Figure 10-8. Reading the HTML source from a web site
The HTML is returned with this line of code:
The programmatic model for the web browser control follows the document object model (DOM). As the browser displays a web site, documentElement and its child nodes become available. In this example, the full HTML is accessed with the innerhtml property. Because the HTML is accessible, you can pass it to any routine you want. For example, you can have a routine that looks for HTML tables from which to pull data or that searches through the HTML for keywords, and so on.