←  IT pentru to(n)ţi

Sinapse

»

Care Stii Sql?!

miki's Photo miki 27 Jul 2017

Treaba e cam asa:

se dau 2 tabele, unu cu vreo 220.000.000 de randuri si altul cu vreo 150.000.000

se cere join  4 col din primul si 2 din al doi-lea( nu prea conteaza de care) si ar iesi o tabla cu vreo 33.000.000.000.000 randur :P

evident ca la faza simpla crapa SQL-ul.... (temp tablespace insuficient) iar spatiu n-am de unde-i da

 

are careva vreo idee?!

Io am incercat cu cursor, da mi-am cam prins urechile...

Quote

kum's Photo kum 27 Jul 2017

Pai ce zici matale acolo nu-i join, e produs cartezian.

S-o luam incet incet... ai 2 tabele pentru facturi: una in care ai informatiile despre facturi la modul general (gen: numar factura, data factura, suma totala, cod fiscal, adresa firma) si alta cu detaliile facturilor (gen numar factura, data factura, produs1, cantitate produs1, pret unitar produs1, pret total produs1, TVA produs1 and so on...). Acum, daca vrei situatia vanzarilor din zilele de luni, marti si miercuri, vei avea un select de genul:

SELECT a.invoice_no, a.invoice_date, b.prod1, b.cant1, b.price1, b.VAT1, b.net_amount1
  FROM invoice_header a
  JOIN invoice_details b ON (a.invoice_no = b.invoice_no AND a.invoice_date = b.invoice_date)
 WHERE invoice_date BETWEEN to_date('24.07.2017', 'dd.mm.yyyy') AND to_date('26.07.2017', 'dd.mm.yyyy');

sau, cu sintaxa veche:

SELECT a.invoice_no, a.invoice_date, b.prod1, b.cant1, b.price1, b.VAT1, b.net_amount1
  FROM invoice_deader a, invoice_details b 
 WHERE a.invoice_no = b.invoice_no AND a.invoice_date = b.invoice_date
  AND invoice_date BETWEEN to_date('24.07.2017', 'dd.mm.yyyy') AND to_date('26.07.2017', 'dd.mm.yyyy');

Presupunand ca in zilele mentionate (24, 25 si 26) tabela invoice_header are 50 de inregistrari iar tabela invoice_details are 500, fara clauza de JOIN

SELECT a.invoice_no, a.invoice_date, b.prod1, b.cant1, b.price1, b.VAT1, b.net_amount1
  FROM invoice_header a, invoice_details b
 WHERE invoice_date BETWEEN to_date('24.07.2017', 'dd.mm.yyyy') AND to_date('26.07.2017', 'dd.mm.yyyy');

ai obtine 25000 de inregistrari, pe cand cu clauza de join obtii exact 500.

 

Am explicat la modul simplu, fara sa te bag in outer join-uri. Daca vrei mai multe despre JOIN-uri, vezi aici. Sau zi-mi mai exact ce vrei sa obtii (ce coloane au cele 2 tabele si ce vrei sa selectezi) si te pot ajuta.

 

L.E. Acu vazui ca de fapt e un outer join ce zici matale acolo. Ce anume vrei sa faci? Ca nu ai cum sa te joci cu un asemenea result set... e prea mare si-ti crapa orice DB.

Quote

vv's Photo vv 27 Jul 2017

220.000.000 de randuri si altul cu vreo 150.000.000

Daca vorbim de produs cartezian, vorbim de approx 3*10^16.
Lasa spatiul pe disc... daca tu poti procesa 1 miliard de randuri pe secunda (unlikely), ai nevoie de 35 zile. Iti permiti sa astepti?

Cel mai probabil, n-ai nevoie de produs cartezian.
Quote

miki's Photo miki 27 Jul 2017

Cu 130.000.000 si 22.000
000 a mers. ,:)
Quote

vv's Photo vv 27 Jul 2017

Pai probabil nu e produs cartezian. Da si matale query-ul (modifica numele coloanelor, sa fie A, B, C - daca e ceva "sensibil")
Quote

miki's Photo miki 27 Jul 2017

Pai nu  e "cartezian' pentru ca am conditie de join-are :P,  am o coloana "comuna" 

 

create table xxxx  as
select   
c.col1,
c.col2,
c.col3,
c.colID,
b.colID,
b.col1,
from tableyyy c
join tablezzzz b
on trim(b.fileid) = trim(c.fileid)
 
 
Bine, acuma trim-ul ala nu crerd ca incurca
Quote

kum's Photo kum 28 Jul 2017

Trim-ul ala exact iti fute indexul (daca exista). Daca vrei sa mergi pe index dar sa folosesti si trim, vezi ca poti defini index pe trim(coloana). Daca nu folosesti index, ar trebui.

Quote

miki's Photo miki 29 Jul 2017

BEGIN
  FOR c in (select * from Buffer buff) LOOP
    for d in (select *
                from Invent sfi
               where trim(c.clmn1) = trim(sfi.clmn1)) loop
      insert into test_rezultat
        (col1, col2, col3)
      values
        (d.clmn1,
         c.clmn2,
         d.clmn3,
         c.clmn4,
         c.clmn5,
         c.clmn6,
         );
    end loop;
    COMMIT;
  end loop;
END;
 
Asta a pornit ieri....
vedem luni ce iese :D
macar nu ia spatiu....:D :D :D
Quote

kum's Photo kum 29 Jul 2017

aidepulata... cursor in cursor = moartea pasiunii.

 

De ce nu incerci sa faci ce-ti trebuie doar in SQL? Scapi de contextual switch intre motorul sql si cel de plsql... asta iti mananca mult timp si resurse

insert /*+ APPEND */ into test_rezultat
select sfi.clmn1, buff.clmn2, sfi.clmn3, buff.clmn4, buff.clmn5, buff.clmn6
  from Invent sfi
  join Buffer buff on (trim(buff.clmn1) = trim(sfi.clmn1));

In plus: ce ai scris tu acolo nuti va merge in veci pentru ca incerci sa inserezi in 3 coloane

insert into test_rezultat (col1, col2, col3)

un rezult-set de 6 coloane

values  (d.clmn1,  c.clmn2,  d.clmn3,  c.clmn4,  c.clmn5,  c.clmn6);

Plus ca ai si o virgula dupa ultima coloana din rezult-set.

 

Incearca ce ti-am sugerat eu si zi-mi daca e mai bine. Cu rezerva ca hint-ul /*+ append*/ are urmatoartele particularilati:

 

Insert /*+ APPEND */ - why it would be horrible for Oracle to make that the "default".

a) it isn't necessarily faster in general. It does a direct path load to disk - bypassing the buffer cache. There are many cases - especially with smaller sets - where the direct path load to disk would be far slower than a conventional path load into the cache.

B) a direct path load always loads above the high water mark, since it is formatting and writing blocks directly to disk - it cannot reuse any existing space. Think about this - if you direct pathed an insert of a 100 byte row that loaded say just two rows - and you did that 1,000 times, you would be using at least 1,000 blocks (never reuse any existing space) - each with two rows. Now, if you did that using a conventional path insert - you would get about 70/80 rows per block in an 8k block database. You would use about 15 blocks. Which would you prefer?

c) you cannot query a table after direct pathing into it until you commit.

d) how many people can direct path into a table at the same time? One - one and only one. It would cause all modifications to serialize. No one else could insert/update/delete or merge into this table until the transaction that direct paths commits.

 

 

 

 

Si fa-ti index pe trim(buff.clmn1) si pe trim(sfi.clmn1).

CREATE INDEX inv_clmn1_idx ON Invent (TRIM(clmn1));
CREATE INDEX buff_clmn1_idx ON Buffer (TRIM(clmn1));

Vezi aici cum si ce fel.

Quote