Home > Dashboard > GeoShell R4 > ... > Plugin Settings > GeoExtract
GeoShell R4 Log In | Sign Up   View a printable version of the current page.
GeoExtract
Added by geKow, last edited by jhonen jones on Aug 10, 2005
Labels: 
(None)

Description

GeoExtract is a Plugin which extracts text from a file or web page and displays it on a GeoBar.

Download

GeoExtract can be downloaded from here. You will need the tcp4w32.dll file in addition to the distribution you select.

Installation

You must copy the tcp4w32.dll file into the directory where geoshell.exe is (otherwise, you'll get load-time problems).

Registry Settings

The numeric settings have minimums and maximums, so if you put something too big or too small (or negative, in the case of values that can't be negative), the number will be brought to within the acceptable range.

Display Settings

Font: This acts the same as in GeoDateTime.

Style: Can be either standard or inset.

  • Standard
  • Inset

Display Width
This is the width of the display area in pixels. If the extracted text's width exceeds this, then the text will side-scroll.

Scroll Interval
How quickly the text side-scrolls. Smaller numbers give faster scrolling.

Tooltip Duration
How long, in seconds, the tooltip will stay visible once it is displayed.

Tooltip X Offset, Tooltip Y Offset
The position of the upper-left corner of the tooltip box relative to the upper-left corner of the GeoExtract plugin, measured in pixels. Larger positive numbers offset the tooltip more down and right, larger negative numbers offset the tooltip more up and left.

Tooltip on New Content
Whether the tooltip should pop up when new content is available (after a shift-click). 1 means yes, pop it up, 0 means no, don't pop it up.

Text Alignment
Can be left, center, or right. Only applies when the text is not scrolling.

Link Colour
If links are detected in the extracted text, they will be displayed in this colour. The colour is specified in RGB, ranging from 0 to 255.

Never Scroll
If set to 0, the text will never scroll, even if it is too long for the display area.

Network Settings

Proxy Server
If you need to access the Internet via a proxy server, set your server here in the same manner as in the Host entry (see below).

Proxy Port
The port to connect to the proxy server on.

Identify As
You can set a string here to use as the User-Agent identification string. This lets you spoof as a specific browser, in case you want to parse a page that blocks out robots, or if you want to get the server to serve a specific version of the page.

Fetch Interval
How often to go fetch the web page from the web server.

Fetch Timeout
How long, at most, GeoExtract will take to download a page before parsing it.

Temporary File Directory
This is the directory where GeoExtract will place downloaded pages. These pages can be viewed to help troubleshoot problems with your search expressions. (default is c:/temp)

Sources

These registry values define the sources for the extractions. The # symbol will be the number for the source. So, you will have Host 0, Host 1, etc., and matching entries for the Absolute Paths and Search Expressions.

Host #
This is the name of the web server. This should not have any http:// prefix, nor should it have any trailing slash.
Example: For the URL http://www.cert.org/advisories/CA-2002-24.html, the Host is www.cert.org.
Alternatively, this can specify a drive, in order to point to a local file.
Example: For the path C:\Program Files\mirc\logs#geoshell.log, the Host should be C:.

Absolute Path #
For URLs, this should ''always'' begin with a forward slash (/). If you just want the main page for a site, this should be simply the slash character by itself. Otherwise, put the filepath part of the URL here.
Example: For the URL http://www.cert.org/advisories/CA-2002-24.html , the Absolute Path is /advisories/CA-2002- 24.html.
Alternatively, if Host # is pointing to a drive, then the Absolute Path should point to a local file.
Example: For the path C:\Program Files\mirc\logs#geoshell.log, the Absolute Path should be \Program Files\mirc\logs#geoshell.log. Note the use of backslashes, and not forward slashes.

Sound on New Content #
Here you can specify a sound (e.g. a wav file) to play when new content is available. This is for use in conjunction with the wait for new content feature (see below). If you don't want any sound to play, specify none.

Continuous Parsing #
Set to either on (1) or off (0). If set on, parsing begins at the end of the file when the plugin starts, and continues from there after each wait for new content. In continuous parsing mode, the wait for new content mode is not only activated by shift-click, but also by tooltip timeout and mouse-out.

Search Expression #
Place your parsing instructions here. Details below...

Search Expressions

Search expressions contain both regular text and wildcards which are called escape sequences because they always begin with the escape character, which is the percent sign (%). The best way to help you understand is to just list the escape sequences and then give you some examples:

Escape Sequence: %!

  • Explanation
    This is the ignore wildcard. It matches anything and everything until the text which follows. %! is assumed at the beginning of the search, so you can start your search expression with whatever the first text is that you want to home in on in the page.
  • A Simple Example
    lets suppose you wanted to find the number of emails in the following html text:
    You have <b>4</b> new messages waiting.

    You would want to skip over everything at the beginning, so you would write:

    %!<b>

Escape Sequence: %*

  • Explanation
    This is the use wildcard, it works exactly the same as the %! wildcard, except that any text which is matched by it will be shown by the plugin. This wildcard ignores HTML tags!
  • A Simple Example
    Assuming the same example as above, we should now use %* to get the number:
    *%!<b>%*</b>*%

    At this point, our GeoExtract plugin would show just the number:4

Escape Sequence: %%

  • Explanation
    This is just for when you need to use the actual % sign without turning it into a wildcard, it is treated as regular text, and therefore can follow a wildcard.
  • A Simple Example
    Now we can expand our example above, lets say the HTML text was:
    *You have <b>4</b> new messages waiting.  Your inbox is <i>about 34% full.</i>*

    To grab the word messages and the percentage, we could do this:

    *%!<b>%*</b> new%*waiting.%!about %*%% full.*%

    This would produce text like:4 messages 34

Escape Sequence: %_

  • Explanation
    This is the quote escape sequence. It is also treated as plain text, except that it allows us to enter text we want to show up in the GeoExtract plugin that does not appear on the page. Any text between two %_ sequences is automatically added to the GeoExtract text.
  • A Simple Example
    Obviously the example above suffers from leaving off the % at the end, so a better way to do the trick above might be:
    %*%!<b>%*</b>%_ e-mails, %_%!about %*%%%_ percent full._%_*%

    Which would produce:

    *4 e-mails, 34 percent full.*

Escape Sequence: %n

  • Explanation
    This is the line break escape sequence. You can use it to place line breaks in the tooltip. It has no effect on the displayed text.

Escape Sequence: %l

  • Explanation
    This is used to capture a single line. That is, it captures all text until it encounters an end-of-line.

This system is clearer once you look at some examples, and work from them to build your own search expressions. Refer to the GeoExtractSearchExpressionLibrary for numerous examples. See also GeoExtractAsLogTailer.

Operation by Mouse

click Hides the tooltip if it is being displayed.
double-click Brings up the source URL in your default browser.
Shift + click Clears the display area and enter wait for new content mode. The display area will only be filled with text again when new, different content is available from the source URL; otherwise it will remain blank. This can be used to be notified of changes (such as with sports scores), keeping your GeoBar free from unnecessary visual activity.
Ctrl + click Disables downloading. Useful for people on dial-up. Use the plugin recycle (see below) to re-enable downloading.
Shift + double-right-click Recycles the plugin; that is, it loads all changes made in registry, and starts a new fetch from the specified sources.
Mouseover Brings up a tooltip which contains up to 2048 characters of the currently extracted text.
Mouseout If Continuous Parsing is turned on, a mouseout will also activate wait for new content mode.

Troubleshooting

If you encounter the error message <path>\geOExtract.dll couldn't be loaded or isn't a plugin dll. you need to copy the tcp4w32.dll file into the same directory as geoshell.exe.

If GeoExtract hangs at the Downloading <site>... part, make sure your temporary directory exists. The default is C:\temp\.

Known Issues

  • Links will not be detected properly in multi-source instances of GeoExtract.
  • Text will not be rendered properly in center-aligned instances of GeoExtract.

Allthough this plugin was abandoned by it's author, You can report problems on the Geoshell.com board.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.3 Build:#641 Jan 13, 2007) - Bug/feature request - Contact Administrators