Thursday, November 6, 2008

seriously simplified postgresql relationship notations/generation

I wanted to blog this small hack while it was still fresh in my head. I am using this to define and fine-tune a postgresql database prior to loading up the data (seperate scripts for another post perhaps)

from a bash commandline paste in:


function m2m () {
for i in $@;
do
echo "create table $i (id bigserial primary key , meta xml );";
done;
echo "create table $1_$2 ($1 bigint references $1, $2 bigint references $2, primary key ($1,$2 )); "
}
function relate () {
for i in $@;
do
IFS='+';
eval c=(${i[*]});
src=$c;
unset c[0];
for j in ${c[*]};
do
m2m $src $j;
done;
done
}

function bouncedb () {
p=($@);
db=$p;
dropdb $db;
createdb $db;
unset p[0];
relate ${p[*]}
}




then :

bouncedb jim artist+album album+song song+lyrics

create table artist (id bigserial primary key , meta xml );
create table album (id bigserial primary key , meta xml );
create table artist_album (artist bigint references artist, album bigint references album, primary key (artist,album ));
create table album (id bigserial primary key , meta xml );
create table song (id bigserial primary key , meta xml );
create table album_song (album bigint references album, song bigint references song, primary key (album,song ));
create table song (id bigserial primary key , meta xml );
create table lyrics (id bigserial primary key , meta xml );
create table song_lyrics (song bigint references song, lyrics bigint references lyrics, primary key (song,lyrics ));

Wednesday, October 1, 2008

terraform java project to maven


This is a hack i put togetther to convert texai from netbeans to maven, with some success.

terraform java and netbeans project to maven


what it is:

a shell script, which includes a few gnu-specific MacPorts shell env amenities to take advantage of gnu tools

what it does: converts multiple java projects to maven 'parent' and 'parent-child' directories

syntax:

$ cd projectRoot
$ DEBUG=t terraform.sh subprj1 subprj2 ...

Features:
if present, this will read netbeans project info and add some metadata

runs the basic archetype to create a relatively sane project.
creates parent/module links
this one is HUGE--
*** converts lib/*.jar to LOCAL MAVEN REPO entries in the poms ***
tries to infer packages it can reuse from the default repos with a small set of presets (such as in texai)
converts src/** to maven convention defaults of src/main/java
converts **/test/** to maven convention defaults of src/test/java
other stuff.


Monday, September 22, 2008

CRUD for a random pojo



many years ago i wrote the most minimal and barebones java bean CRUD form to do generic data entry in some distributed agent use cases.  

I have more or less brushed this off and given it an autoboxing nudge.

in the particular instance used for demonstration, XStream *can* be in the

 class path for our test pojo "Subject", as shown in the debugger below.

XStream is used opportunistically via reflection when found in the classpath, but is not an interruption otherwise.
 
The editor is pretty simple, it does its best effort to edit to and from strings.

For this test case we use the swing JOptionPane to provide us with a dialog box using 1 line of code around the JPanel. 

the next slide shows a little bit of extra credit, putting the original values in the tooltips. 


next, the final result, hitting "Yes".  The Pojo "Subject class" has managed to do round trip editing of a primitive "int" as well as a string.  







Monday, September 15, 2008

java polymorphic functional metaprogramming

I picked up an old project this weekend and knocked it into orbit along with most of the common assumptions I had about implementing languages in java. 


What i had 6 months ago was an artificial heap tool to create cursors in a tract of RAM that act as structs, and the means to generate pojo facades, or visitors, very effective for mmap'd stuff.


this weekend I added the concept of an execution environment to the heap metaprogramming.  I'm also working out the generator [re]- generating the successive generators (a nod to The Diamond Age, Seed Technology :))


I created a mmap'd abstraction of heap, a small cache-friendly register page, and a int[] stack abstraction to provide a context mechanism. 


I decided upon predicate-subject execution pairing in order to provide an execution environment directly within a graph-store of triples, thus a large mmap'd extent of triples can themselves be executing their representative functions upon graphs so designed as programs, where function vectors assign meanings to triple predicates.


the java binding of predicate->subject interaction to heap objects falls upon an invented vtable.


the vtable uses enum (Enum extends VTable, actually) elements as methods.  vtable instances carry a  'traits' bitmap to facilitate 'state' transform 'views' of the byte-extents rather than java inherited type system. thus two objects with common traits or a path to common traits can 'view transform' their bytes of heap state to a LCD medium polymorphically.  additionally, traits path resolution can be transferred among vtables at runtime as a growing cache of type impedance resolutions.  


the vtable's functions, which are java's enum elements have symbolic names matching their predicates.  matching as many predicates as possible in fact, in an execution order. 


to illustrate the simplest component, here's a genuine metaprogram java enum creating 3 concrete struct storage slots of int heap, and what happens at time.


java syntax:


enum Triple implements VTable{

subject(4), //offset = 0; offset is a computed final value in the elements

predicate(4), //offset += 4

$as$Predicate(predicate), //copy ctor 'predicate' element, size and offset are copied not computed

object(4), //offset +=size==8 now, total final length of the bytes is 12 (8 offset +4 bytes data)

$as$Uri(0,0){[...]}   //size=offset=0, pointer to triple, has methods but no writable extent.

;[...]

}


layout of heap bytes:

enum Triple{            0__________b 

subject                 0__3 

predicate|                  4__7 

         |$as$Predicate     4__7 

object                          8__b

$as$Uri                 0-- // as in just a pointer to the struct head

}


narrative:

$as$Predicate is an enum element that looks and acts like "predicate" element, but differs in a few ways by very cheap arrangement of the java source code.

1) the state of $as$Predicate as it pertains to its artificial heap struct is identical.

2) the name of $as$Predicate itself contains a predicate, "as", allowing one to 'infer' that there is a triple "as" or providing a 'predicate' on its behalf, where a trait would specify Triple needs Predicate traits

3) predicate and $as$Predicate java enum elements don't share methods, or names, but do share data.


$as$Uri -- one of several ways to provide an enum element with struct pointer offset information and methods


the java enum elements provide a type inference hint as to what they intend to do.  they contain as many methods as required for the traits they start out with to make this happen, and it is hoped with as much type inference as can be achieved in java.. 


thus 'predicate' seems a little less constricting than choosing either the term "function" or "method".


the java method's names within the java enum elements are short, and incidentally, as a type inference and polymorphic experiment, the attempt is that the java$enum$element$Methods are all named '$', or '$$', and so on, exposing the outer most '$' externally, and to use local methods '$$' by type inference dispatch from within '$' execution, and so forth choosing '$$$' from there.. etc, via generics -- an experimental work in test to impose maximum type inference via javac and generics wildcard-capture.


Figure-A ; Literate programming using symbol declarations as Predicate Assertions


a simple syntax to seed VTable definitions and typedefs: 


enum Graph{

$as$Set$$Vertice(Set),

$as$Set$$Edge(false),;

}


declarations representative of the semantic linkage produced at and successive code generations:

enum Vertice{

/** a java enum slot before code-regen, runtime bound

    automatically in time as $as$Point(Point) 

*/

$as$Point,


/**

the same effect, after reflecting the previous generation.

*/

$as$Triple($as$Value$Type.){{$as$Value$Type=Triple.class;}},;


}


enum Pair{

$as$first(4),       //pointer ->

$as$second(false),; //

}


enum Edge{

$as$Pair$$Vertice(Pair),;

}




Triple t = (Triple) Triple.$(ByteBuffer heap, byte[] register, int[] stack)



 //semantic typedef: (my) $ (slot within) as $ (my) Set $$ (Set having ) Vertice



as close to 

public final static ((Vtable)MyType).java$enum$element$method.$(...)

as possible but no closer.


...unifying java method names is hoped to reduce the complexity of n*m Functor*Methods combinatorial at runtime,  allowing the type specifications to be a hidden ugliness of java rather than a geometrically growing typecast typing practice exercise, or to toss out type introspective inference entirely by forcing method clients to call (Object)foo


and finally, by (manually) naming the function symbols as closely descriptive of their intent as possible, need all simple functions be retained or executed once so declared?  this is a question for inference at a later juncture, but the hope is to produce a system of pure java compilation, and to aid in features which promote the greatest runtime entropy.


there is an additional component to outwardly expressive symbol names -- semantic inference.


I already produced a simple reflection based source code regenerator from the reflection artifacts of the objects at runtime.  a package's enums are regenerated.


these enums, designed as stateless visitors prior to the vtable binding concepts, also semantically link to each other based on symbol names and predicate hints in the first edition.  they do this linking both at time in java and at reflection/generation time in the tool, same outcome with slightly different implementations.


I have added $as$comment$String and $as$java$sourcecode$String attributes to generative elements, both enum classes and enum elements,  so that reflection can access these as String field.get(), put them into the source code being generated, and something ASM and BCEL cannot accomplish, display source code and comments at runtime.  


Using Annotations as a data fork on the reproductive elements of the source representation was considered, but as forks go, it makes more work for the reflection process of re-assembling source code, and increases dependancies. 


I'm interested in some amount of syntactical facilitation of other predicate logic representations and grammars, not opposed to a runtime regex thunk in the background if needed, to align the manual and syntactically facilitated tricks that are generator-refined over time.


there is enough here to go forward writing a really, seriously twisted performance form of java triple store on shmem heap, aided as well by a second project of mine underway building a sparql query and caching relational mapper in c++.  Sparql queries in a JVM context is probably not an important  criterion, its more the bus throughput of working on the triples once they are in a common blob on disk (unified structurally to the c++ triple cache layout, and mmap'd).


Monday, May 26, 2008

The new hotness: My silent living room server hack.


Inside of... the new hotness...

new toy...

I had a recent discovery... VMWARE CRUMBLES under situations where the shutdown is less than planned, I could reboot and fsck vista, however vmware would not sync up due to untold and countless numbers of sync, lock, and snapshot inconsistencies. after a dozen rescues and one nasty failure, this got old. It was time to move gentoo onto my former AMD dual core and reuse some DDR2 sticks which were mismatched with other sets of 4.  

now it's running (and building) ResearchCyc 1.0 and performing filesystem experiments for the Billion Triples Challenge.

Not bad for a silent living room box with a console on my hd-tv

Silent But gladly Setback More….
case $120.00 Thermaltake Lanbox
Motherboard $80.00 MSI K9AGM2-FIH Socket AM2 AMD 690G DDR2 mATX Motherboard w/ HDMI, FireWire, Radeon 1250 Video, 8ch Audio, and GigaBit LAN.
PSU $130.00 Zalman ZM-600 HP Heatpipe Cooled PSU. blue internal light.
CPU cooler $80.00 ZALMAN CNPS 9500 AM2 2 Ball CPU Cooling Fan/Heatsink-- squashed about 1.5 inches. fan bent 50 degrees aimed at chipset riser. flush to PSU intake fan above at about 1-2 cm apart.
System Media $50.00 (4gb) scsi 0:0:0:0: Direct-Access LEXAR JD LIGHTNING II 1100 PQ: 0 ANSI: 0 CCS.
CPU $240.00 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02 - 2.0 ghz
RAM $130.00 Patriot Viper 4GB (2 x 2GB) 240-Pin DDR2 SDRAM DDR2 1066 (PC2 8500) Dual Channel Kit Desktop Memory @ 800mhz
keyboard $20.00 1998 M$ Internet Explorer Keyboard, usb+ps2+hub used for xbox, this machine, approx 30-40 machines before it, 100's of OS installs across 4 generations of CPU-- 300 bogomips up to 22000 OC q6600.
noise levels $0.00 PSU noise only, silent to within 1 foot of case. our flourescent bulbs make more noise than the box. qfan on max cooling 40 degree 5 degree
case fans $0.00 3, fixed-voltage, disconnected for noise and wear. 2 on back deck above accessories panel. 1 blue lighted fan in front. The case fans were stock however the connectors are 2-pin and the fans make a lot more noise than the PSU.
video $0.00 hdmi over radeon 500/AMD/Ati 1250.
System Media $0.00 (4gb) scsi 0:0:0:0: Direct-Access LEXAR JD LIGHTNING II 1100 PQ: 0 ANSI: 0 CCS.
Ethernet $0.00 cat 5e tested good to 75% gig-e utilization
Cha-Ching!!! $850.00

First, the inspired record-setting non-trunked ethernet saturation that has heretofore never been witnessed on my personal hardware:
This was a mounted cifs volume to my quad-core vista with raid-0 across 3 drives (~170 M/s during the rare opportunity for non-interrupted linear IO under vista)
livecd slow # mount |grep slow
//10.0.12.223/in on /mnt/slow type cifs (rw,user=guest)
a write test, and speed-indicator prior to loop-mounting over cifs.
livecd slow # dd if=/dev/zero of=ntfs.test count=16 bs=256M 16+0 records in 16+0 records out
256+0 records in
256+0 records out
4294967296 bytes (4.3 GB) copied, 52.1532 s, 82.4 MB/s
the LEXAR media benchmark, livecd slow # hdparm --direct -t /dev/sda
/dev/sda: Timing O_DIRECT disk reads: 100 MB in 3.05 seconds = 32.79 MB/sec
the general size of the running image with portage rsync'd, and kernel sources installed and built without clean.
livecd slow # du / -xs
3926493 /
livecd slow # df -h
Filesystem Size Used Avail Use% Mounted
/dev/sda1 3.8G 3.5G 262M 94.00% /
/tmp 1.9G 28K 1.9G 1.00% /tmp
/vol/image.squashfs 1.6G 1.6G 0B 100.00% /mnt/livecd
udev 10M 112K 9.9M 2.00% /dev
vartmp 1.9G 554M 1.4G 29.00% /var/tmp

Other Features:

this working gentoo installation is a live-cd from thumb-drive -- the livecd squashfs image provides 1.6 gigs of AMD64 gentoo, which is about 4 gigs uncompressed even with reiserfs tails.
Rather than run in a read-only FS the command:
cp -asx /mnt/livecd/{usr,opt,lib,dev} /mnt/gento
...provides the read-only but replaceable base filesystem (all directories full of symlinks pointing back to the read-only compressed volume), leaving the other dirs to exists as-first class persistent files. without /etc, /bin, /sbin, /lib, and /var using genuine files, the system doesn't bootstrap well.

portage plays nicely with this configuration. using the livecd (on USB dvd) seems to provide the grub package without forcing the emerge (of grub) step in stage 3. big time saver.

Yet TODO:

my original hunch was to build a thumbdrive linux, and then transfer the filesystem to a flat image file with an nfs server boot and tftpd with pxegrub.
benchmarking the network filesystem confirms this (gig-e link) as a valid route to performance. The vista (x64) server has 8 gigs which will help out with access turnaround, but building the userspace nfsd on vista might be a chore. cygwin nfsd will chew up an entire core on high speed transfers, that's not likely to help much.  

update: vista has been retired.  a new mac book pro has taken its place.  The quadcore and terabytes await razing some time soon with a gentoo clone of the living room machine (at 82M/s !!).

Monday, February 18, 2008

Another dusty benchmark: Filesystem Torque Curves, gentoo 2.4.18




Filesystem Torque Curves, gentoo linux 2.4.18 5400rpm maxtor 80gig on athlon xp1700



Doing more cleaning, and dusted off an old filesytem benchmark deserving of notice.

This benchmark was built from a bash script that formatted the same partition each time with different filesystems and the -o options shown on the legend.

dbench was run with x processes and then the filesystem was wiped and formatted as above and the buffers purged. I ran this many times and got comfortable with the relative numbers being consistent, so this may represent only 1 test run but it was pretty stable at the time.

dbench runs a recorded session of deliberate samba abuse * n processes; A collection of large file operation and inode abuse.

Presently this machine is an anonymous motherboard sitting on a shelf in a stack next to a collection of anonymous old hard drives these days.

I wrote this benchmark to see if I was actually missing out on something that XFS brought to the table. I tend to go with reiser3 (tails rock for all things portage) for general purpose filesystems and ext2 or huge swapfs for performance workloads such as processing multimedia or mythtv sandbox.

XFS was touted as the coolest filesystem and supposedly excelled at all things for all needs according to several zealots I've run across. I always thought that it looked a bit contrived compared to ext2 and had less thought to detail than reiserfs. i had no idea those 2 filesystems would leave it in the dust until I ran the benchmark.

I'm fascinated these days with LinLogFS -- A Log-Structured Filesystem For Linux or Linux Log-structured Filesystem Project but the applications haven't presented themselves frequently.

Other items of note from this benchmark
  • dbench probably offers some opportune areas for inflated numbers corresponding to the force of numbers of developer man-years exposed to the linux driver module in question... where..
  • ext2's implementation is brutally terse with good results and lots of eyes and hands contributing vfs enhancements based on ext2 being the gold standard to cater around. (vfs doesn't seem so simple since this benchmark's kernel version)
  • Reiser shows the oddest process load harmonic. there's some nitrous in concurrent power of 2 access there...
  • JFS was having a really awful hair day at the time.
  • Minix crapped out on more than 2 processes effectively breaking the script and 0's were padded in.
  • pagesize as blocksize makes all the difference on fsbench.
  • journaling overhead seems to be an option that's somewhat costly for general diversified workloads. if you dont care about the data persistence, then you probably want the fast edition of the data, and won't need a fsck when a mkfs is plenty sufficient and takes a fraction of the time (such as with high performance computing processing nodes and various web state).
PDF: filesystem benchresults
Google Doc: spreadsheet include interval timings

System.Nanotimer benchmarks

i wrote a small java benchmark to see grab the System.nanoTimer value. On this machine there's like a hard 558 ns limit on the x64 jdk 1.6._10 and 698 lower limit on the 32 bit jre.
In this test I also tested out the atomics and the difference between capture times of buffer and arrays. buffers and arrays are virtually identical performance surrounding the nanotime pull

Directbuffer storage access seems to be the consistent lagger, as is expected. The atomics in the java.utils.concurrent.atomic.LongArray class is not so fast ... Boxing also has a lower average than norm.

of note: the jit works well, they all tend to share the lowest access times(hard limit mentioned above)

it looks like factoring out the call to System.nanoTimer will shave perhaps 500 or so nanoseconds from each loop iteration.

edit:
it was noted from others that variance was pretty high here... the goal of the effort was partially exploring JIT local mediums as well.

Further bench results will probably have some cool visualizations to check out.

see here for source:

http://www.google.com/notebook/public/18239550943564485025/BDQ5wIgoQxbPnoIIj#SDR5nIgoQlITsoIIj

O:\opt\java\jdk1.6.0_10\bin\java -Didea.launcher.port=7536 -Didea.launcher.bin.path=M:\opt\JetBrains\intellij-702\bin -Dfile.encoding=windows-1252 -classpath O:\opt\java\jdk1.6.0_10\jre\lib\charsets.jar;O:\opt\java\jdk1.6.0_10\jre\lib\jce.jar;O:\opt\java\jdk1.6.0_10\jre\lib\jsse.jar;O:\opt\java\jdk1.6.0_10\jre\lib\management-agent.jar;O:\opt\java\jdk1.6.0_10\jre\lib\resources.jar;O:\opt\java\jdk1.6.0_10\jre\lib\rt.jar;O:\opt\java\jdk1.6.0_10\jre\lib\ext\dnsns.jar;O:\opt\java\jdk1.6.0_10\jre\lib\ext\localedata.jar;O:\opt\java\jdk1.6.0_10\jre\lib\ext\sunjce_provider.jar;M:\work\enigmatrie\target\production\enigmatrie;M:\opt\JetBrains\intellij-702\lib\junit.jar;M:\opt\JetBrains\intellij-702\lib\idea_rt.jar com.intellij.rt.execution.application.AppMain com.glamdringinc.benchmark.SystemNanotimerCapture

[{{tester; stat_arr} {avg; 663} {low; 558} {mid; 629} {hi; 13969} {variance;6705}
, {{tester; stat_fin_arr} {avg; 664} {low; 558} {mid; 629} {hi; 22419} {variance;10930}
, {{tester; arr} {avg; 667} {low; 558} {mid; 629} {hi; 31499} {variance;15470}
, {{tester; vol_arr} {avg; 676} {low; 558} {mid; 629} {hi; 38482} {variance;18962}
, {{tester; fin_arr} {avg; 677} {low; 558} {mid; 629} {hi; 48610} {variance;24026}
, {{tester; heap_buff_absolute} {avg; 683} {low; 558} {mid; 629} {hi; 33314} {variance;16378}
, {{tester; array_backed_buf_absolute} {avg; 689} {low; 558} {mid; 629} {hi; 33594} {variance;16518}
, {{tester; atom_arr_lazyset} {avg; 697} {low; 558} {mid; 629} {hi; 39739} {variance;19590}
, {{tester; array_backed_buffer} {avg; 698} {low; 558} {mid; 698} {hi; 38203} {variance;18822}
, {{tester; vol_val} {avg; 710} {low; 558} {mid; 698} {hi; 42952} {variance;21197}
, {{tester; atomic_arr_cmp_set} {avg; 714} {low; 558} {mid; 698} {hi; 39739} {variance;19590}
, {{tester; atomic_arr_set} {avg; 719} {low; 558} {mid; 698} {hi; 87022} {variance;43232}
, {{tester; heap_buf_unbox} {avg; 723} {low; 558} {mid; 698} {hi; 65302} {variance;32372}
, {{tester; mmap_buf_get_prv} {avg; 725} {low; 558} {mid; 698} {hi; 65790} {variance;32616}
, {{tester; atomic_arr_getset} {avg; 728} {low; 558} {mid; 698} {hi; 35479} {variance;17460}
, {{tester; dir_buf_get} {avg; 742} {low; 558} {mid; 698} {hi; 41765} {variance;20603}
, {{tester; mmap_buf_get_rw} {avg; 751} {low; 558} {mid; 698} {hi; 65721} {variance;32581}
]


Process finished with exit code 0

Sunday, February 17, 2008

What happened here?

http://apod.nasa.gov/apod/image/0802/crabmosaic_hst_big.jpg

Origins

Main article: SN 1054

First observed in 1731 by John Bevis, the nebula was independently rediscovered in 1758 by Charles Messier as he was observing a bright comet. Messier catalogued it as the first entry in his catalogue of comet-like objects. The Earl of Rosse observed the nebula at Birr Castle in the 1840s, and referred to the object as the Crab Nebula because a drawing he made of it looked like a crab.[4]

In the early 20th century, the analysis of early photographs of the nebula taken several years apart revealed that it was expanding. Tracing the expansion back revealed that the nebula must have formed about 900 years ago. Historical records revealed that a new star bright enough to be seen in the daytime had been recorded in the same part of the sky by Chinese and Arab astronomers in 1054[5][6] Given its great distance, the daytime "guest star" observed by the Chinese and Arabs could only have been a supernova—a massive, exploding star, having exhausted its supply of energy from nuclear fusion and collapsed in on itself.

Recent analyses of historical records have found that the supernova that created the Crab Nebula probably occurred in April or early May, rising to its maximum brightness of between apparent magnitude −7 and −4.5 (brighter than everything in the night sky except the Moon) by July. The supernova was visible to the naked eye for about two years after its first observation.[7] Thanks to the recorded observations of Far Eastern and Middle Eastern astronomers of 1054, Crab Nebula became the first astronomical object recognized as being connected to a supernova explosion.[6]