ClickOnChris

Christopher G Johnson: programmer, entrepreneur, father

Archive for the ‘Java’ Category

Full Text Search for the enterprise with Oracle Text

without comments

You work in software and your stack includes an Oracle database.  One day the business approaches you and says ‘I want a search page for our product/order/customer data.  Make it work like Google’.  You think to yourself, “If I could make a search page work like Google I would work at Google”!!!  Fear not, developer.  This problem has been solved many times in the past.  In this blog post I’m going to show you how to approach this problem, and show you a shortcut in case your environment’s stack includes an Oracle database.

Approaching Full Text Search

The problem you’re solving has a name and that name is Full Text Search.  The problem is that your Relational database, while presumably well normalized, is not good at searching for single words across huge data sets.  You need a different kind of database which is optimized for full text search.  A Search database will physically store the data differently so that it can quickly look up your search terms and return some metadata associated with those terms.  In your RDMBS, records are identified by keys.  In your Search index, they keys are the search terms.

There are several well known full text search solutions.  The bare minimum list you should probably know about is Solr/LuceneSphinx, and ElasticSearch.  These are all great full text search solutions, but they all require a lot of overhead to operate.  New servers, new software to install, new syntaxes to learn, admin consoles, and new interfaces or libraries to build into your front end application.

Oracle Stack Solution: Oracle Text

One drawback of each of the aforementioned search solutions is that you will likely want to run it on a dedicated machine (or VM).  If you work in an Oracle shop it likely means that you work in an enterprise where provisioning hardware (even virtual hardware) can be annoyingly difficult and time-consuming process.   I this environment, Oracle Text jumps out as a really nice solution.  Oracle Text is a full text search solution that is built in to all modern version of Oracle’s database.  This means that you don’t have to request a new machine, and request for new software to be installed on that machine in each of your QA and Production environments (or request for root access to do it yourself).  With Oracle Text you just run some DDL to create the index and start using it!*  The only hardware issue you should consider is the amount of disk in use on your Oracle database.

Here’s a simple example of how to take advantage of an Oracle Text search index.  Let’s assume that I have a database with products and reviews (a product has many reviews) and I want to be able to return search results for both at once.

The most straight-forward way to start is to gather all of the data you want to index into a single VARCHAR2 column named SEARCH_TEXT on our PRODUCT table.  If you need to index more than 4000 characters, use a CLOB.

Now we need to populate that column with the search data we want to index from the PRODUCTS and REVIEWS tables  We are going to fetch the data into the search text column as a big space delimited string.  The below query is called a correlated update, and is specific to Oracle.   You can accomplish the same thing with a procedure but I find this more concise.

Next we create Oracle Text index on that column.  The important part is the ctxsys.context at the end of this statement.  Context is one of the three types of text indexes that oracle offers, but the best one for blocks of structured text.

It is worth noting that you can configure the index to use a separate tablespace so that you can control where on the disk your index lives.  See the docs for more info.

Next we we run a command to ‘sync‘ the index.  This actually indexes the data for the first time.  Run it again after you’ve inserted or updated data to update the index.  In fact, you should  plan on running this command periodically as part of a dbms_scheduler or whatever your enterprise’s favorite scheduler is.

Now we can run a full text search query and see some results.  A statement like this will return all product records which have the word ‘paper’ in the title, description, or reviews. yay!  It’s pretty awesome that we can run searches on this index in our existing RDBMS and apply whatever filters, sorts, and joins we want without having to call out to another system.

Finally, we create a job to periodically ‘optimize‘ the index.  According to the docs your index gets fragmented and slower over time and this will fix it up.  I’ve had luck with running this nightly but YMMV.

After you’ve got your index up and running you can get some useful info and stats out of it with the CTX_REPORTS package.  Among other things it will tell you how fragmented your index is, and what words are the most frequently indexed.

I’ve really just scratched the surface to show you how to get a text index up and running fast.  Oracle has a ton of options to tune the index, and search features like fuzzy searching, stemming, and wildcards.

*Ok, maybe you should still consult a DBA first if you have access to one.

Resources:

http://www.oracle.com/us/products/database/enterprise-edition/oracletext12cpresentation-1961514.pdf?ssSourceSiteId=otnen

http://docs.oracle.com/cd/E11882_01/text.112/e24436.pdf

 

Written by clickonchris

February 3rd, 2014 at 9:10 pm

Reducing JBoss’s Memory Footprint

without comments

I am working on a project to convert a handful of J2EE applications from an Oracle OC4J application (no longer supported) server to JBoss 5.1.0.  Among the many challenges in the conversion is the fact that JBoss’s default profile has a significantly larger memory footprint than OC4J.  In the past I have just accepted that Jboss uses over 400MB of heap space before you even deploy anything. This time however we were hoping to reuse the same hardware from the old application server with the new application server.  When the test system started paging and eventually using up all of the physical memory available, we were forced to choose between ordering more memory and trying to tune jboss to reduce the memory footprint.

We ended up having a lot of success reducing the footprint through tuning.  Bottom line: we reduced the memory footprint by 120MB, and the startup time from 53s to 24s

Here were the steps taken

Heap Size (MB) Used (MB) Reduction in Used (MB)
Starting Heap 419 314
commented out debug level MBeans annotation in deployers.xml 322 247 67
removed ejb3 services 317 238 9
removed messaging folder & props 310 238 0
removed seam & admin-console 256 205 33
Removed xnio-deployer and xnio-provider 256 203 2
removed ROOT.war 256 203 0
removed management 256 199 4
removed jbossws.sar 256 193 6

 

The instructions for each step can be found : http://community.jboss.org/wiki/JBoss5xTuningSlimming

Notes on my environment and testing process:

  • Windows XP, JDK 1.6.0_22,
  • JBoss 5.1.0.GA.  Xmx=512M , Xmx=256 (this is why heap didn’t drop below 256)
  • I used jvisualvm to watch the heap and “used” memory values
  • For the “Used” memory, I took the maximum observed value while JBoss was starting.  If you understand that a time vs. memory usage graph follows a sawtooth pattern as objects are instantiated and garbage collected, then I took the value from the tip of the highest tooth.

Written by clickonchris

June 1st, 2011 at 11:58 am

Posted in Java,JBoss

No Fluff Just Stuff 2011, Madison, WI

without comments

I attended the No Fluff Just Stuff conference in Madison, WI this weekend. It was a great chance to learn the newest Java trends and share struggles in programming with people much like myself, even if most of them were cheeseheads. Here are some of my reflections on the state of Java tech post-conference:

-Java 7 is underwhelming, mostly because it will not have closures. It will however introduce enhancements to speed up Groovy, JRuby, and Scala

-If I read between the lines what features are in HTML and any of them are supported by Chrome, then I think that HTML5+Chrome could easily turn into a gaming platform!

-There is no question that Groovy is the “next big thing” for Java. Get on board.

-A java developer could easily add Hadoop to his resume (and dollars to his pocket), by learning Hadoop with Cascading. Hadoop-worthy scale data sets are available for free from amazon: http://aws.amazon.com/publicdatasets/

Grails is really cool. I wish there was a hosting platform available that was anything close to Heroku for Rails. I think it is unlikely that we will hear any good Startup stories with Grails for that reason.

Written by clickonchris

February 27th, 2011 at 1:54 pm

Setting the Default Version of Java in Windows

without comments

You’re a java programmer on a Windows development environment.  You fire up the command prompt and run “java -version”.  It’s some JRE and its not even the version you want!  Eclipse is throwing a fit.  There’s no good way to figure out what path the “java.exe” you are executing lives in.

The most surefire way to solve this problem is take the java bin path (ex: C:\program files\Java\jdk1.6.0_07\bin)  you want and prepend it to your System level PATH environment variable, as highlighted below.  This will ensure that you are executing the java.exe from the expected java installation every time.

Written by clickonchris

October 27th, 2010 at 8:32 am

Posted in Java

JPA Annotation Cheatsheet

with one comment

Whenever I need need help configuring JPA annotations I turn to google, and I always find it difficult to find a good cheatsheet.

Well here is my favorite JPA annotation cheatsheet.  Even though it says Oracle and Toplink it applies to any Spring/JPA/Hibernate technology stack:

http://www.oracle.com/technetwork/middleware/ias/toplink-jpa-annotations-096251.html

Oh yeah, and here’s my tip on using cascades: Never use CascadeType.ALL!  Use MERGE, PERSIST, and maybe REMOVE if you want to cascade your deletes.  CascadeType.ALL will result in poor performance and unintended consequences.

Written by clickonchris

August 31st, 2010 at 7:51 am

Posted in Java,Programming

Configuring Jetty, Maven, and Eclipse together with Hot Swap

with 6 comments

For over a year I’ve been developing a Java webapp in Hibernate with maven and Jetty.  Recently I’ve figure out how to make them all play nice with each other.  For too long I had to restart my application server, which takes upwards of 45 seconds, for any code changes to make it to my development server.  This tutorial will show you how to setup Jetty in embedded mode, and using Eclipse, attach a debugger to enable True Hot Swap of code onto your Jetty server.

Environment Information:

JDK 1.5+
Eclipse 3.4.0
maven 2.0.10
m2eclipse 0.9.7 (maven plugin for eclipse)
Jetty 6.1.10
Spring
JPA,Hibernate

Read the rest of this entry »

Written by clickonchris

May 27th, 2010 at 11:06 pm

Posted in Databases,Java,Programming

Tagged with

Java Project Versioning with perforce plus ant

without comments

I recently developed a useful ANT task to automatically increment a version number on your Java project when using perforce as your source control application. This task is intended to be run as part of an automated build (via cruisecontrol). It checks out version.properties and checks it back in after incrementing.
Notes:

  1. The task will look for the following files in the same directory as your build.xml. You should be able to figure out what parameters belong in each file by looking at the task. files: version.properties, buildnumber.properties, p4.properties
  2. In the lines where I print the full version number I have broken it up into two lines for display purposes. In practice you will want to keep it on one line.

Written by clickonchris

April 8th, 2008 at 8:03 pm

Posted in Java

Tagged with

Switch to our mobile site