Model and implications
This section does not intend to go through all the cases; other books centered on JPA do it very well. In order to avoid doing things that can have a negative impact on the performances, this part will show you that the abstraction JPA does need some attention.
To illustrate this statement, we will reuse the Customer/Quote relationship. As it is @ManyToMany, it relies on a join table. Here is a representation of the model:
The use case is when you want to access the other side of the relationship: Quotes from a Customer (getQuotes()) or the opposite (getCustomers().size()).
Here, the provider will find all the entities that have the current entity identifier in the join table.
This sounds perfectly fine but how can it affect the performance? If you check the structure of the join table in MySQL, you will immediately see a minor difference:
The quotes_ID column has an index, whereas the customers_ID column does not. Do not be fooled by the picture and the fact that both the columns have a yellow key. The primary key is the composed key of both the columns, so the index is not useless and allows us to select fast rows from quotes_ID. Why does quotes_ID have an index and customers_ID hasn't? Because the Quote entity is the owner of the relationship. However, it will always be faster to select columns by the Quote identifier rather than by the Customer identifier.
Now the interesting part is to compare both the calls:
quote.getCustomers()
customer.getQuotes()
The first call will load the customers from an already loaded quote whereas the second call will load the quotes related to an already loaded customer.
Now let's see what the corresponding generated SQL will be. The first invocation will be converted to the following statement:
SELECT t1.ID, t1.NAME FROM QUOTE_CUSTOMER t0, CUSTOMER t1 WHERE ((t0.quotes_ID = ?) AND (t1.ID = t0.customers_ID))
The second invocation (customer.getQuotes()) will be converted to the following:
SELECT t1.ID, t1.NAME, t1.VALUE FROM QUOTE_CUSTOMER t0, QUOTE t1 WHERE ((t0.customers_ID = ?) AND (t1.ID = t0.quotes_ID))
A join is done with the known sides of the relationship, which means the entity containing the relationship (set of entities). Yet, we saw that only one of the two columns of the join table has an index. This means that one side will be slower than the other side. If you use bi-directional relationships, you should ensure that you make the owner of the relationship either of the following:
- The one that is way more used than the other one (if there is a huge difference)
- The one that will bring back a smaller set of entities than the other one
This is just an example of how a very fast model can impact the performance. This is a general statement that is valid for any modeling. Anyway, since JPA makes modeling very easy and not as much database-related, it is easier to make it wrong.