NYCPHP Meetup

Wed May 16 07:35:49 EDT 2007

csnyder wrote:
> On 5/15/07, Jon Baer <jonbaer at jonbaer.com> wrote:
>> This is really a 2 layer approach not using MySQL for FT indexing.
>> Any other option gives you
>> way more IR functions than MySQL does.  Although it would be fair to
>> point out that it is pluggable: ( <http://dev.mysql.com/doc/refman/
>> 5.1/en/plugin-full-text-plugins.html> ) But still pretty new.
>
> I'm all for the 2 layer approach -- at that point it doesn't really
> matter how you store the data in the db. YAML seems especially
> friendly, but it could be RDF, some other flavor of XML, or even a
> serialized PHP object. As a developer you get a tremendous amount of
> flexibility at a negative cost, because you don't have to write any
> complex SQL, ever, once you take care of CRUD.
>
> It scares the beejeesus out of anyone with a background in databases,
> though. The people I work with think I'm completely off my nut for
> storing data this way. They're asking me to use custom tables for even
> the simplest business objects, despite the fact that the model changes
> constantly. I'd love to find a technique that we could all embrace.
>
> If Ken Downs is reading this thread, I hope next Tuesday's talk will
> touch on this topic...

I've been following but I don't know enough about the underlying problem 
to comment beyond general rules.

The approach under discussion is known as Entity-Attribute-Value and, 
yes, it is considered a major mistake when used in a SQL database 
because SQL is all about using rows and tables, and you can't do any of 
that anymore.  You've left yourself with no reasonable query language.  
What used to be "select name,city from mytable" now requires a join (or 
left join) at very least.  The trouble only gets worse.  Since 
everything you are doing with a relational database requires SQL, and 
since SQL has no tables to query, simple concepts like JOINS become 
insane, performance drops, and as you said the database people start 
calling you names and will refuse to help.  Expect a sort of "it's your 
funeral" attitude.

Here is an unusually non-bitter presentation on wikipedia:

http://en.wikipedia.org/wiki/Entity-Attribute-Value_model

The real underlying problems usually come down one or more of:

1) Unknown structure
2) Known structure, but seen as "too complex"
3) Structure *appears* to change often
4) Confusion between structured data and text data
5) Desire for "flexibility" in structure
6) Weak tools for creating/updating structures

For #1, we can probably agree if that if the structure is not yet known, 
nobody should be coding.  Go back to the users for more interviews.

For #2, the standard reply is that EAV is not going to solve the 
underlying problem, its going to make it worse.  Complex structures will 
require complex queries, which are impossible using EAV in a SQL 
database.  This is probably a sticky situation where somebody needs to 
go ask the paymaster for more money and time.

For #3, I underscore "appears" to change.  While structures do change, 
the appearance of rapid change can often be traced to other factors.  A 
skilled table designer can listen to the apparently random noise 
generated by the users and discern patterns that the users are unaware 
of.  They will often think in details, we are supposed to be thinking 
systematically.

For #4, what I mean is that you may have things like product 
descriptions for an electronic catalog, where many different product 
categories contain zillions of little details.  A great example of this 
is mwave.  EAV looks good when confronting the bewildering prospect of 
making a table for this stuff.  But it often turns out that all items 
have real attributes that must be in the database, not the least of 
which are price and weight, but a lot of the rest of it is descriptive, 
and can go into a text field (it can be html, or wiki text as well).

For #5, This is usually a symptom that the programmer does not yet know 
if his problem is 1,2,3,4 or 6.  He is having problems matching code to 
table structure, and wants to solve that problem with "flexibility", 
some way of lessening the pain of trying to keep these two coordinated.  
It may be that he needs a better tool to do so (see #6), or he may be 
confronting #3 and #4 and not realizing that he needs to nail down the 
structure more.

For #6, in Ken's humble opinion the absence of strong tools is the 
number one problem for those who handle tables every day.  It is quite 
simply a royal PITA to build tables, modify them, and then track changes 
and get them into production.  If you get into stored procedures and 
triggers the problems with basic chores of version control, debugging 
and so forth only get worse.  This basic chore was the first thing I 
coded in Andromeda, it's something that's crying out to be done.  The 
absence of these tools drives people to avoid structure changes and is 
the number one reason people find themselves trying to write code that 
will somehow not depend on the table structure.  My own answer was to 
write Andromeda so that it would be easy to match my code to the tables.

-- 
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com    www.andromeda-project.org
631-689-7200   Fax: 631-689-0527
cell: 631-379-0010

NYCPHP Meetup

NYPHP.org

[nycphp-talk] Flexible Forms & How to store them...