Forum Ferret
Forum Ferret is a simple Internet forum scanner developed using
RapidJ v1.1 and now distributed with RapidJ as an example project.
Code Canvas hopes that you might find it useful and that it gives
you an insight into how applications generated by RapidJ can be customised.
Note: Prior to RapidJ v1.1, Forum Ferret was provided as a
download.
Note: Before scanning a website it is important to ensure that this
does not violate their terms of use policy.
1. Entity configuration
|
2. Creating the database
|
3. GUI component configuration
|
4. Generated code
|
5. Viewing customisations
|
6. Scanner configuration
|
7. Scanning for messages
|
|
|
Features
Forum Ferret has the following features:
- Scans Internet forums for new messages.
- Identifies messages with new replies.
- Ability to mark messages as important.
- Flexible scanner configuration using regular expressions.
- Ability to sort messages by various fields.
Think something is missing? Try adding it yourself!
Installation
As of RapidJ v1.1, Forum Ferret is included with RapidJ as an example project.
To install the Forum Ferret example project:
- Open the Forum Ferret example project from the examples directory
in the RapidJ installation directory.
- Install Forum Ferret in the same way as the other example projects.
(For help with this see the Customer Database example
under Getting Started in the Help Contents).
NOTE: By default, Forum Ferret uses the forumferret database
in the bundled HSQLDB database server. Make sure you uncheck
Test records when generating the database script so that
test data is not created.
- Open Forum Ferret. If you have installed it into the bundled Resin
application server, open it at http://localhost:8080/forumferret/
- Now you are ready to configure some scanners.
Scanner Configuration
To configure a scanner you need the following:
- URL of the forum - The URL must be a link
to a forum page that displays a list of messages, each with
a title, link and optionally either the number of posts or
the number of replies. The URL should include any necessary
query parameters.
- A regular expression - The regular expression must
match each message on the page and capture the URL, title
and the number of replies (or posts) as a
group.
To add a new scanner goto the Scanner List page and press the Add New button.
The following table describes the scanner fields and show an example
configuration that searches the struts-user mailing list on the
Mailing list ARChives
for the month of July 2005.
| Field |
Description |
Example |
| Name |
A meaningful name. |
Struts User - Mailing list ARChives |
| Url |
The location of the forum. |
http://marc.theaimsgroup.com/?l=struts-user&r=1&b=200506&w=2 |
| Reg Expr |
The regular expression used to match each message on the page. |
\[((<a href=".*?">)|(<font color=".*?">))(\d*?)((</a>)|(</font>))] <a href="(\?l=struts-user&m=.*?)">(.*?)</a> |
| Title Group |
The group index that corresponds to the title of the message. |
9 |
| Url Group |
The group index that corresponds to the URL of the message. |
8 |
| Num Replies Group |
The group index that corresponds to the number of replies to the message. |
-1 |
| Num Posts Group |
The group index that corresponds to the total number of posts for the message thread. |
4 |
Note: Group 0 is the entire matched string. The first group that
you explicitly specify starts from an index of 1. Specify a group
of -1 to indicate that there is no group for a particular field.
Note: Usually forums will display either the number of replies for a message
or the total number of messages in that thread. This means you only need to
specify one of Num Replies Group or Num Posts Group. Specify a
group of -1 for the field that is not used.
Note: Before scanning a website it is important to ensure that this
does not violate their terms of use policy.
Regular Expressions
The following resources can help you learn more about regular expressions in Java:
Feedback/Bugs
Want to let us know what you think of Forum Ferret? Found a bug? Please send feedback.
|