Ошибка 2646 терадата

There is a snipped of product code that does some row check. It’s actually migrated code that came into teradata and no one has bothered to change it to be TD savvy, should I say.
This code now throws

2646 : No More spool...

Error and that is not really a spool shortage but due to data-skew as would be evident to any Teradata Master.

Code logic is plain stupid but they are running it in Prod. Code change is NOT an option now because this is production. I can rewrite it using a Simple NOT Exists and the Query will run fine.

    EXPLAIN SELECT  ((COALESCE(FF.SKEW_COL,-99999)))  AS Cnt1,
        COUNT(*) AS Cnt 
 FROM   DB.10_BILLON_FACT FF 
WHERE   FF.SKEW_COL IN(
SELECT  F.SKEW_COL 
FROM    DB.10_BILLON_FACT F 

EXCEPT  
SELECT  D.DIM_COL 
FROM    DB.Smaller_DIM D



) 
 
 

Its failing because it wants to redistribute on SKEW_COL. WHATEVER I DO THIS WILL NOT CHANGE. SKEW_COL is 99% skewed.

here’s the explain.FAILS ON STEP # 4.1

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  3) We lock DB.F for read, and we lock DB.D for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.F by way of an
          all-rows scan with no residual conditions into Spool 6
          (all_amps), which is redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 6 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 6
          is estimated with low confidence to be 989,301 rows (
          28,689,729 bytes).  The estimated time for this step is 1
          minute and 36 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is built locally on the AMPs.  Then we do a
          SORT to order Spool 7 by the hash code of (
          DB.D.DIM_COL).  The size of Spool 7 is
          estimated with low confidence to be 6,118,545 rows (
          177,437,805 bytes).  The estimated time for this step is 0.11
          seconds.
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an exclusion
     merge join, with a join condition of ("Field_1 = Field_1").  The
     result goes into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     494,651 rows (14,344,879 bytes).  The estimated time for this step
     is 3.00 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps), which is
          redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 5 by row hash.  The size of Spool 5
          is estimated with low confidence to be 494,651 rows (
          12,366,275 bytes).  The estimated time for this step is 0.13
          seconds.
       2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
          all-rows scan with no residual conditions into Spool 8
          (all_amps) fanned out into 24 hash join partitions, which is
          built locally on the AMPs.  The size of Spool 8 is estimated
          with high confidence to be 2,603,284,805 rows (
          54,668,980,905 bytes).  The estimated time for this step is
          24.40 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 9 is estimated with low confidence to be 249,304,104 rows (
     5,235,386,184 bytes).  The estimated time for this step is 1.55
     seconds.
  8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 (Last Use) by way of an
     all-rows scan.  Spool 8 and Spool 9 are joined using a inclusion
     hash join of 24 partitions, with a join condition of (
     "SKEW_COL = SKEW_COL").  The
     result goes into Spool 4 (all_amps), which is built locally on the
     AMPs.  The size of Spool 4 is estimated with index join confidence
     to be 1,630,304,007 rows (37,496,992,161 bytes).  The estimated
     time for this step is 11.92 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     DB.FF.SKEW_COL).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 11.  The size
     of Spool 11 is estimated with low confidence to be 494,651 rows (
     14,344,879 bytes).  The estimated time for this step is 35.00
     seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 2 (group_amps), which is built locally
     on the AMPs.  The size of Spool 2 is estimated with low confidence
     to be 494,651 rows (16,323,483 bytes).  The estimated time for
     this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 52 seconds.

    

There are some 900K unique values of skewed_ column and * ( interestingly there are 6 Million unique values for DIM_COL, which is why I think it is veering towards the Fact table column. But still..it knows from the Low Unique value in the bigger table, that its badly skewed )
My Q is after knowing that SKEWED_COL is 99% skewed due to a constant value like -9999 WHY does the optimizer still redistribute by this skewed column instead of using alternate PRPD approach. A similar ( but not same ) situation happened in past but when we upgraded to faster box ( more AMPS ) it went away .

Anything that comes to mind that will make it change plans. I tried most diagnostics — no result. Created a SI ( On a similar VT but it will still skew ).SKEWING is inevitable , ( You can artificially change the data — I am aware so to minimize this BUT all that is NOT after the fact. Now we are in PROD. Everything is over ) but even after it knows the Col is Skewed, why re-distribute it when other options are available

Its not the NULL value that skewing . Its a constant flag value ( probably value rep. of the NULL like -9999 that is causing the skew as I mentioned in the poster ) . If you rewrite the Q as I updated it works fine. I preferred NOT EXISTS because the latter will not need NULL CHECKING ( as a practice though from my DD knowledge — i know both cols are declared NOT NULL ) . I have updated the Poster with an alternative code that will work ( though like I explained — i finalized with the NOT exists version)

Select     count(*) , f.SKEW_COL 
from      ( 
select  ff.SKEW_COL 
from      DB.10_BILLON_FACT ff 
where   ff.SKEW_COL not in ( 
select  d.DIM_COL 
from      DB.Smaller_DIM d  )) as  f
Group   by f.SKEW_COL

Can I not get the optimizer query rewrite feature to think through the Q and rewrite with above logic. The above will NOT redistribute but JUST SORT By the Skewed Column

Thanks for your answering !

This is an edit :

SELECT 'SYSA', 
   t1.lieu_stkph_cd, 
   Sum (t1.mt_pnu_cpta_dev_rep), 
   Sum (t2.mt_util_dev_rep) 
FROM   (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           a.mt_pnu_cpta_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
    WHERE  c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           b.mt_util_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
           INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b 
                      ON a.dt_art = b.dt_art 
                      AND a.id_auto = b.id_auto 
    WHERE  c.pma_cd = '') AS t2 
             ON  T1.id_auto = t2.id_auto 
            and T1.dt_art = T2.dt_art and t1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP  BY 1, 
          2

This is the outcome of this query:

LIEU_STKPH_CD       PNU Amount        UTILIZATION AMOUNT
1                                  200 €
                         250 €

It’s not accurate, I explain:

db_ftg_srs_prod_v.v_autorisation_a is linked to db_ftg_srs_prod_v.v_utilisation_a with
— ID_AUTO
— DT_ART
but I can have 1 ID_AUTO for X UTILISATION, so with this query I will multiply by X utilisation the PNU amount, which is not correct

Authorization table

ID_AUTO       PNU amount
1                     100 €

Utilization table

ID_AUTO       ID_UTLIZATION        UTILIZATION AMOUNT
1                     1
                                100 €
1                     2
                                150 €

So I have to separate those value:

Expected outcome

LIEU_STKPH_CD       PNU Amount        UTILIZATION AMOUNT
1                                  100 €
                         250 €

Do you have any idea ?

Thanks in advance
Christophe

Have you ever come across a situation called NO MORE SPOOL SPACE?  My friend does and hence I studied a bit about Teradata.

The query:
SELECT DISTINCT fieldname FROM tablename;

The error:
“NO MORE SPOOL SPACE”.

Correct, the problem wasn’t related to MySQL, that was for something called Teradata.

Initially, I could see that query should be using temporary space (from name SPOOL SPACE I guessed) to get distinct data – may be it’s a huge table or something.
So I think one has to look into temporary table space’s size – they call it SPOOL SPACE.

Next is Google’s turn to explain me more, I Googled with appropriate keywords and got following understandings:
Spool is temporary disk space used to hold intermediate rows during query processing, and to hold the rows in the answer set of a transaction.

Spool space is allocated to a user as a total amount available for that user, however this total amount is spread across all AMPS. This means if you’re allocated 200G of spool and you’re on a system with 24 AMPS, you’re allocated ~8.3G of spool on each AMP. If any AMP exceeds its allocated amount of spool during a transaction you submit, your query is cancelled with error: NO MORE SPOOL SPACE.

Common reasons for “No More Spool Space Error”:

1) All the rows in the query are being processed on a few amps creating a “hot amp” situation. In this case, just a few amps are racking up spool while the others sit comparatively idle. This is caused when the tables in the query are missing stats, have been improperly PI’d, or are otherwise “untuned”.

2) The query is excluding critical join criteria resulting in Cartesian products.

3) You just don’t have enough space to perform the query.
From other readings I suggested one query change:

SELECT fieldname FROM tablename GROUP BY fieldname;

The reason behind above query is:
Teradata uses two different methods to produce the result sets of above queries, However the performance varies dramatically depending on the data.

SELECT DISTINCT is designed for data sets that are nearly unique to begin with and works by sorting the entire intermediate spool file and discarding duplicate values.

SELECT/GROUP BY is designed for data sets that have relatively few unique values and works by performing an AMP local grouping operation and then merging the partial result sets for final processing.

Few other steps (even) I can suggest to solve such problem are:

– Explain to see execution plan to have an idea.
– Check table definition for indexes: Show table table-name;
– CREATE INDEX indexName (field-name) on table-name;

Finally, if nothing happens contact your DBA or the “right person” 🙂

I hope this will help someone, somewhere,  someday.

If you know that I’ve misunderstood something please comment and give respective links – I should be correcting it asap.

Related Tags
  • no more spool space,
  • query,
  • teradata

Question

Error Code: 2646 No more spool space in %DBID

Answer :

What does the error mean?

Error code 2646 could be returned when spool space is not available in the specified database or user for the query.

Error code 2646 may indicate that spool space may have run out on at least one AMP.

Why does the error occur?

The user does not have enough spool space to complete a query.
The profile that the user is assigned to does not have enough spool space. Teradata uses the profile’s lower spool space as the max spool space.

Spool has been exhausted on a single AMP. This is a frequent issue caused by an extremely skewed
request. Spool space is allocated to a user, not a session. If multiple people sign on using the same User Id, the last request may run out of spool space because of an excessive request of another session using the same User Id.

What should be done when the error is received?

Check to see if a query being run exceeds the maximum allowed amount of spool space.
                     The user may have insufficient spool space to run query.
                                            Increase user spool space
                     The profile the user is assigned to may not have sufficient spool space to run the query.
                                            Increase profile spool space
                      If spool space keeps running out for the same query.
                                            Check to see that the statistics are up to date

Workaround:

Run UPDATESPACE against database/user who has phantom spool.

> update spool space for systemfe

Query to check phantom spool… can be used to check left over spool, too.

Example:

CurrentSpool should be 0 once query finished.

SELECT DATABASENAME(char(15)), VPROC, CURRENTSPOOL, PEAKSPOOL
FROM DBC.DISKSPACE WHERE DATABASENAME=’systemfe’;

*** Query completed. 20 rows found. 4 columns returned.
*** Total elapsed time was 1 second.

DatabaseName Vproc CurrentSpool PeakSpool
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
systemfe 4 1,200,000,823,296 1,200,000,823,296

My error is no more spool space when I create a table with the qualify number_rows,
the goal is to get the first 100 samples by key,
each key is composed by the following fields: (top_typ_vision, instid, don_gener3, don_gener4,rg_no, lieu_stkph_cd,id_sect_base_resp)

When I execute the select, the code works very well, once I add the create I get the error no more spool space

thank you !!

```sql
create multiset table mdc_cobalt_det as (
sel
top_typ_vision, 
instid, 
type_enr as type_obj_ofs,
don_gener1, 
don_gener2, 
don_gener3, 
don_gener4, 
rg_no,
lieu_stkph_cd,
id_sect_base_resp
from PROD_V_CTRL_ANOMALIE
qualify row_number () over (partition by top_typ_vision, 
instid, 
don_gener3, 
don_gener4,
rg_no, lieu_stkph_cd,
id_sect_base_resp order by rg_no ) <= 100)
with data 
primary index (top_typ_vision, rg_no, don_gener3, don_gener4, lieu_stkph_cd, id_sect_base_resp);

When I get an error stating ‘NO MORE SPOOL SPACE IN USER’ what does it mean?

IF user has spool space limit defined as 10GB and there are 10 AMPS.
i.e. 1GB of spool space is available on each AMP under this user.
Spool space problem depends on
1. Volume of data queried/accessed
2. Index selection on table.
3. Stats on table
4. No of active session per user

If your processing data goes into one AMP which is holding only 1 GB
spool then it will fail with ‘NO MORE SPOOL SPACE IN USER’. i.e. data is
skewed.
Information on current user’s profile and active sessions:
Session info:
Below query gives, how many queries are running at this moment by user.

Locking row for access
SELECT* FROM dbc.sessioninfo WHERE username=’######’;

Spool space info: Below query gives, how much spool has been allocated to the user:

Locking row for access
SELECT username,spoolspace,profilename
FROM dbc.users WHERE username=’#####’;

Note:
If profilename is not null, then the spool is probably being allocated by the amount in dbc.profiles.
If profilename is null, then spool is being allocated by the amount defined in dbc.users .

SQL: locking row for access select * from dbc.Users where username=’#####’;

Approach to resolve “spool space error”:

Follow below steps to resolve the error.
These are potential causes:

1. Statistics are stale. Run HELP STATS on each table involve in SQL.
2. Verify and confirm (using step 1) that there is no other active session running too many large concurrent queries.
Note: Spool space is shared across session under the user and can cause
error if there are sessions running too many large queries.
3. The data is skewed and is running out of spool on one amp. After 10 minutes, you can check PDCRDATA to get skew information.
SQL:

LOCKING ROW FOR ACCESS
SELECT T1.STARTTIME “StartTime”
, T1.FIRSTSTEPTIME “FirstStepTime”
, T1.FIRSTRESPTIME “FirstResponseTime”
, T1.USERNAME “UserName”
, T1.SESSIONID “SessionId”
, T1.QUERYID “QueryId”
, T1.PROCID “ProcessId”
, (
EXTRACT(HOUR FROM ((T1.FIRSTRESPTIME-T1.FIRSTSTEPTIME) HOUR(4) TO SECOND))
*3600 + EXTRACT(MINUTE FROM ((T1.FIRSTRESPTIME-T1.FIRSTSTEPTIME) HOUR(4) TO SECOND)) * 60 +
EXTRACT(SECOND FROM ((T1.FIRSTRESPTIME-T1.FIRSTSTEPTIME) HOUR(4) TO SECOND))
) “FirstStepProcessedTime”
, (
EXTRACT(HOUR FROM ((T1.FIRSTRESPTIME – T1.STARTTIME) HOUR(4) TO SECOND)) *3600 +
EXTRACT(MINUTE FROM ((T1.FIRSTRESPTIME – T1.STARTTIME) HOUR(4) TO SECOND)) * 60 +
EXTRACT(SECOND FROM ((T1.FIRSTRESPTIME – T1.STARTTIME) HOUR(4) TO SECOND))
) “ElapsedTime”
, (
EXTRACT(HOUR FROM ((T1.FIRSTSTEPTIME – T1.STARTTIME) HOUR(4) TO SECOND)) *3600 +
EXTRACT(MINUTE FROM ((T1.FIRSTSTEPTIME – T1.STARTTIME) HOUR(4) TO SECOND)) * 60 +
EXTRACT(SECOND FROM ((T1.FIRSTSTEPTIME – T1.STARTTIME) HOUR(4) TO SECOND))
) “GapTime”
, T1.DELAYTIME “DelayTime”
, T1.NUMOFACTIVEAMPS “NumOfActiveAmps”
, T1.TDWMALLAMPFLAG “TDWMAllAmpFlag”
, T1.TDWMESTTOTALTIME “TDWMEstTotalTime”
, T1.ESTPROCTIME “EstProcTime”
, T1.ESTRESULTROWS “EstResultRows”
, T1.ESTMAXROWCOUNT “EstMaxRowCount”
, T1.AMPCPUTIME “AMPCPUTime”
,T1.TOTALIOCOUNT “TotalIOCount”
,T1.PARSERCPUTIME “ParserCPUTime”
,T1.NUMRESULTROWS “NumResultRows”
,T1.CACHEFLAG “CacheFlag”
,ZEROIFNULL(T1.MAXAMPCPUTIME/NULLIFZERO(T1.AMPCPUTIME/(HASHAMP()+1))) “CPUSkew”
,ZEROIFNULL(T1.MAXAMPIO/NULLIFZERO(T1.TOTALIOCOUNT/(HASHAMP()+1))) “IOSkew”
, ZEROIFNULL(T1.AMPCPUTIME*1000/NULLIFZERO(T1.TOTALIOCOUNT)) “PJI”
, ZEROIFNULL(T1.TOTALIOCOUNT/NULLIFZERO(T1.AMPCPUTIME*1000)) “UII”
,T1.MAXAMPCPUTIME*(HASHAMP()+1) “ImpactCPU”
,T1.MAXAMPIO*(HASHAMP()+1) “ImpactIO”
,T1.MAXCPUAMPNUMBER “MaxCPUAmpNumber”
,T1.MAXIOAMPNUMBER “MaxIOAmpNumber”
,T1.NUMSTEPS “NumSteps”
,T1.SPOOLUSAGE “SpoolUsage”
,T1.ERRORCODE “ErrorCode”
,T1.ERRORTEXT “ErrorText”
,T1.STATEMENTTYPE “StatementType”
,T1.DEFAULTDATABASE
,T1.WDID “WDId”
,T1.FINALWDID “FinalWDId”
, CASE
WHEN CHARS(T2.SQLTEXTINFO) <= 15000
THEN SUBSTR(T2.SQLTEXTINFO,1,15000)
ELSE NULL
END “SQLTextInfo”
, CASE
WHEN CHARS(T2.SQLTEXTINFO) > 15000
THEN T2.SQLTEXTINFO
ELSE NULL
END “LongSQLTextInfo”
FROM DBC.DBQLOGTBL T1
LEFT
JOIN DBC.DBQLSQLTBL T2
ON T1.PROCID=T2.PROCID
AND T1.QUERYID=T2.QUERYID
WHERE T2.SQLROWNO=1
and T1.USERNAME=’GHHUSDWETL’ —-OR T1.SESSIONID=<#####>;
Note: Modify Index column of tables if required.
4. Analyze the volume on each tables involved in SQL.
5. Run the EXPLAIN for the query in verbose mode and collect stats on recommended columns.
6. If incorrect PIs are chosen (Skewed table) , the INSERT staments can also spool out.

diagnostic verbose explain on for session;
diagnostic helpstats on for session;

Conclusion: The above steps are to optimize existing spool space
allocated. Alternatively you can also looking at allocating more spool
space depending on the user query.

  • Remove From My Forums
  • Вопрос

  • Hi

    My source was Teradata and i use ado.net to connect it

    it has more then 100 million rows when i try to move to sql server 2008

    i find some error no spool  space

    error message  like this

    ADO NET Source [1]] Error: Teradata.Client.Provider.TdException: [Teradata Database] [2646] No more spool space in aloha_tpp_prd_etl.
      at Teradata.Client.Provider.WpRequestManager.HandleRequestError()
      at Teradata.Client.Provider.WpStartRequestManager.ProcessRequestResponse()
      at Teradata.Client.Provider.WpStartRequestManager.Action()
      at Teradata.Client.Provider.Request.ExecuteStartRequest(String commandText, TeraTypeBase[][] parameters, ExecutionMode executionMode, Boolean asynchronous, Boolean isTrustedRequest)
      at Teradata.Client.Provider.TdCommand.ExecuteRequest(CommandBehavior cmdBehavior, Boolean asynchronousCall)
      at Teradata.Client.Provider.TdCommand.ExecuteReader(CommandBehavior behavior)
      at Teradata.Client.Provider.TdCommand.ExecuteDbDataReader(CommandBehavior behavior)
      at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
      at Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
      at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    some one could please help me on this issue.


    nethan

No more spool space

Why do I get «No more spool space» error?

This is the most familiar error message in Teradata world:

«Failure 2646 No more spool space»

What does it really mean, what is it caused by?

Let’s get back to the basics.

What is spool space?

Spool space is a temporary area, that can be used to store the part-results during query processing, as well as volatile tables. All free space in the database that are not allocated by PERM data, technically can be used for spool area, since a PERM data does not want to allocate that area.

Each database users may have a «spool limit» that restricts the user to allocate more spool area at a time, than its limit. Keep in mind that all active sessions of a username must share the spool limit together.

Teradata is a massive parallel system, therefore the 

spool limit must be interpreted on AMP level

:
Eg: 100AMP system, a user with 10G spool limit means: 100M spool/AMP

What is spool space limit good for?

This limitation method is a quite simple way to cut those queries from the system that would suck too much resources. There is no exact relationship between high spool usage and ineffective query, but statistically the correlation is high.
Practically: a bad query is being kicked off before is consumes too much resources unnecessarily.

No more spool space scenarios

System ran out of spool space

This is the most rare situation, forget about. There are too few free space on the system, but this situation used to be avoided by defining a «SpoolReserve» database, where no objects are created, this way that area is always available for spool.
If many «big spool limit» users run high spool queries parallel, then this rare situation can yet occure.

Multiple session of the user are active together

This is a quite rare situation also. Check the active users from dbc.sessioninfo.

Volatile tables

All existing volatile tables reside in your available spool space, reducing the available. If you create many, and even with skewed distribution, you can stuff your spool up. Choose «primary index» carefully, when defining volatile tables also.

Improper execution plan

These are the >90% of cases that cause the «No more spool space» errors. Let’ see how:

  • «Duplication to all AMPs» of a non-small set of records
    The root cause is typically missing or stale statistics. Either system thinks that much less records will be duplicated than the real (sometimes billions of records get in this kind of spools), or knows this exactly, but on the other branch of the query there are more low quality estimations, and this execution seems to be cheaper. 
  • Redistribution of records by a hash that causes skewed distribution
    Check the corresponding blog post: Accelerate skewed joins
  • Retrieve huge amount of records into spool (locally or redistributed onto the AMPs)
    Specific query structures imply this execution, like: join to a view that «union all»-s big tables.

Thanks for your answering !

This is an edit :

SELECT 'SYSA', 
   t1.lieu_stkph_cd, 
   Sum (t1.mt_pnu_cpta_dev_rep), 
   Sum (t2.mt_util_dev_rep) 
FROM   (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           a.mt_pnu_cpta_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
    WHERE  c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           b.mt_util_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
           INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b 
                      ON a.dt_art = b.dt_art 
                      AND a.id_auto = b.id_auto 
    WHERE  c.pma_cd = '') AS t2 
             ON  T1.id_auto = t2.id_auto 
            and T1.dt_art = T2.dt_art and t1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP  BY 1, 
          2

This is the outcome of this query:

LIEU_STKPH_CD       PNU Amount        UTILIZATION AMOUNT
1                                  200 €
                         250 €

It’s not accurate, I explain:

db_ftg_srs_prod_v.v_autorisation_a is linked to db_ftg_srs_prod_v.v_utilisation_a with
— ID_AUTO
— DT_ART
but I can have 1 ID_AUTO for X UTILISATION, so with this query I will multiply by X utilisation the PNU amount, which is not correct

Authorization table

ID_AUTO       PNU amount
1                     100 €

Utilization table

ID_AUTO       ID_UTLIZATION        UTILIZATION AMOUNT
1                     1
                                100 €
1                     2
                                150 €

So I have to separate those value:

Expected outcome

LIEU_STKPH_CD       PNU Amount        UTILIZATION AMOUNT
1                                  100 €
                         250 €

Do you have any idea ?

Thanks in advance
Christophe

There is a snipped of product code that does some row check. It’s actually migrated code that came into teradata and no one has bothered to change it to be TD savvy, should I say.
This code now throws

2646 : No More spool...

Error and that is not really a spool shortage but due to data-skew as would be evident to any Teradata Master.

Code logic is plain stupid but they are running it in Prod. Code change is NOT an option now because this is production. I can rewrite it using a Simple NOT Exists and the Query will run fine.

    EXPLAIN SELECT  ((COALESCE(FF.SKEW_COL,-99999)))  AS Cnt1,
        COUNT(*) AS Cnt 
 FROM   DB.10_BILLON_FACT FF 
WHERE   FF.SKEW_COL IN(
SELECT  F.SKEW_COL 
FROM    DB.10_BILLON_FACT F 

EXCEPT  
SELECT  D.DIM_COL 
FROM    DB.Smaller_DIM D



) 
 
 

Its failing because it wants to redistribute on SKEW_COL. WHATEVER I DO THIS WILL NOT CHANGE. SKEW_COL is 99% skewed.

here’s the explain.FAILS ON STEP # 4.1

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  3) We lock DB.F for read, and we lock DB.D for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.F by way of an
          all-rows scan with no residual conditions into Spool 6
          (all_amps), which is redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 6 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 6
          is estimated with low confidence to be 989,301 rows (
          28,689,729 bytes).  The estimated time for this step is 1
          minute and 36 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is built locally on the AMPs.  Then we do a
          SORT to order Spool 7 by the hash code of (
          DB.D.DIM_COL).  The size of Spool 7 is
          estimated with low confidence to be 6,118,545 rows (
          177,437,805 bytes).  The estimated time for this step is 0.11
          seconds.
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an exclusion
     merge join, with a join condition of ("Field_1 = Field_1").  The
     result goes into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     494,651 rows (14,344,879 bytes).  The estimated time for this step
     is 3.00 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps), which is
          redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 5 by row hash.  The size of Spool 5
          is estimated with low confidence to be 494,651 rows (
          12,366,275 bytes).  The estimated time for this step is 0.13
          seconds.
       2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
          all-rows scan with no residual conditions into Spool 8
          (all_amps) fanned out into 24 hash join partitions, which is
          built locally on the AMPs.  The size of Spool 8 is estimated
          with high confidence to be 2,603,284,805 rows (
          54,668,980,905 bytes).  The estimated time for this step is
          24.40 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 9 is estimated with low confidence to be 249,304,104 rows (
     5,235,386,184 bytes).  The estimated time for this step is 1.55
     seconds.
  8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 (Last Use) by way of an
     all-rows scan.  Spool 8 and Spool 9 are joined using a inclusion
     hash join of 24 partitions, with a join condition of (
     "SKEW_COL = SKEW_COL").  The
     result goes into Spool 4 (all_amps), which is built locally on the
     AMPs.  The size of Spool 4 is estimated with index join confidence
     to be 1,630,304,007 rows (37,496,992,161 bytes).  The estimated
     time for this step is 11.92 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     DB.FF.SKEW_COL).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 11.  The size
     of Spool 11 is estimated with low confidence to be 494,651 rows (
     14,344,879 bytes).  The estimated time for this step is 35.00
     seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 2 (group_amps), which is built locally
     on the AMPs.  The size of Spool 2 is estimated with low confidence
     to be 494,651 rows (16,323,483 bytes).  The estimated time for
     this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 52 seconds.

    

There are some 900K unique values of skewed_ column and * ( interestingly there are 6 Million unique values for DIM_COL, which is why I think it is veering towards the Fact table column. But still..it knows from the Low Unique value in the bigger table, that its badly skewed )
My Q is after knowing that SKEWED_COL is 99% skewed due to a constant value like -9999 WHY does the optimizer still redistribute by this skewed column instead of using alternate PRPD approach. A similar ( but not same ) situation happened in past but when we upgraded to faster box ( more AMPS ) it went away .

Anything that comes to mind that will make it change plans. I tried most diagnostics — no result. Created a SI ( On a similar VT but it will still skew ).SKEWING is inevitable , ( You can artificially change the data — I am aware so to minimize this BUT all that is NOT after the fact. Now we are in PROD. Everything is over ) but even after it knows the Col is Skewed, why re-distribute it when other options are available

Its not the NULL value that skewing . Its a constant flag value ( probably value rep. of the NULL like -9999 that is causing the skew as I mentioned in the poster ) . If you rewrite the Q as I updated it works fine. I preferred NOT EXISTS because the latter will not need NULL CHECKING ( as a practice though from my DD knowledge — i know both cols are declared NOT NULL ) . I have updated the Poster with an alternative code that will work ( though like I explained — i finalized with the NOT exists version)

Select     count(*) , f.SKEW_COL 
from      ( 
select  ff.SKEW_COL 
from      DB.10_BILLON_FACT ff 
where   ff.SKEW_COL not in ( 
select  d.DIM_COL 
from      DB.Smaller_DIM d  )) as  f
Group   by f.SKEW_COL

Can I not get the optimizer query rewrite feature to think through the Q and rewrite with above logic. The above will NOT redistribute but JUST SORT By the Skewed Column

Существует фрагмент кода продукта, который выполняет некоторую проверку строк. Это фактически перенесенный код, который вошел в teradata, и никто не удосужился изменить его, чтобы он был здравым смыслом TD, если я скажу. Этот код сейчас выкидывает

2646 : No More spool...

Ошибка, и это на самом деле не недостаток катушки, а из-за перекоса данных, что было бы очевидно любому мастеру Teradata.

Логика кода просто глупая, но они запускают ее в Prod. Изменение кода не является вариантом сейчас, потому что это производство. Я могу переписать его, используя Simple NOT Exists, и запрос будет работать нормально.

    EXPLAIN SELECT  ((COALESCE(FF.SKEW_COL,-99999)))  AS Cnt1,
        COUNT(*) AS Cnt 
 FROM   DB.10_BILLON_FACT FF 
WHERE   FF.SKEW_COL IN(
SELECT  F.SKEW_COL 
FROM    DB.10_BILLON_FACT F 

EXCEPT  
SELECT  D.DIM_COL 
FROM    DB.Smaller_DIM D



) 
 
 

Его сбой, потому что он хочет перераспределить на SKEW_COL. Что бы я ни делал, ЭТО НЕ ИЗМЕНИТСЯ. SKEW_COL перекошен на 99%.

вот объяснение.СБОЙ НА ШАГЕ № 4.1

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  3) We lock DB.F for read, and we lock DB.D for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.F by way of an
          all-rows scan with no residual conditions into Spool 6
          (all_amps), which is redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 6 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 6
          is estimated with low confidence to be 989,301 rows (
          28,689,729 bytes).  The estimated time for this step is 1
          minute and 36 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is built locally on the AMPs.  Then we do a
          SORT to order Spool 7 by the hash code of (
          DB.D.DIM_COL).  The size of Spool 7 is
          estimated with low confidence to be 6,118,545 rows (
          177,437,805 bytes).  The estimated time for this step is 0.11
          seconds.
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an exclusion
     merge join, with a join condition of ("Field_1 = Field_1").  The
     result goes into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     494,651 rows (14,344,879 bytes).  The estimated time for this step
     is 3.00 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps), which is
          redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 5 by row hash.  The size of Spool 5
          is estimated with low confidence to be 494,651 rows (
          12,366,275 bytes).  The estimated time for this step is 0.13
          seconds.
       2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
          all-rows scan with no residual conditions into Spool 8
          (all_amps) fanned out into 24 hash join partitions, which is
          built locally on the AMPs.  The size of Spool 8 is estimated
          with high confidence to be 2,603,284,805 rows (
          54,668,980,905 bytes).  The estimated time for this step is
          24.40 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 9 is estimated with low confidence to be 249,304,104 rows (
     5,235,386,184 bytes).  The estimated time for this step is 1.55
     seconds.
  8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 (Last Use) by way of an
     all-rows scan.  Spool 8 and Spool 9 are joined using a inclusion
     hash join of 24 partitions, with a join condition of (
     "SKEW_COL = SKEW_COL").  The
     result goes into Spool 4 (all_amps), which is built locally on the
     AMPs.  The size of Spool 4 is estimated with index join confidence
     to be 1,630,304,007 rows (37,496,992,161 bytes).  The estimated
     time for this step is 11.92 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     DB.FF.SKEW_COL).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 11.  The size
     of Spool 11 is estimated with low confidence to be 494,651 rows (
     14,344,879 bytes).  The estimated time for this step is 35.00
     seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 2 (group_amps), which is built locally
     on the AMPs.  The size of Spool 2 is estimated with low confidence
     to be 494,651 rows (16,323,483 bytes).  The estimated time for
     this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 52 seconds.

    

Есть около 900K уникальных значений для столбца skewed_ и * (интересно, что для DIM_COL есть 6 миллионов уникальных значений, поэтому я думаю, что он поворачивается к столбцу таблицы фактов. Но все же… он знает по значению Low Unique в большем таблица, это плохо перекосило) Мой вопрос после того, как я узнал, что SKEWED_COL перекошен на 99% из-за постоянного значения, например -9999 ПОЧЕМУ оптимизатор все еще перераспределяет по этому перекосу столбца вместо использования альтернативного подхода PRPD. Подобная (но не та же самая) ситуация случалась в прошлом, но когда мы перешли на более быструю коробку (больше AMPS), она ушла.

Все, что приходит на ум, заставит его изменить планы. Я пытался большинство диагностики — безрезультатно. Создан SI (На подобном VT, но он все еще будет искажен).SKEWING неизбежен, (Вы можете искусственно изменить данные — я знаю, чтобы минимизировать это, НО все, что НЕ после факта. Теперь мы в PROD. Все окончен) но даже после того, как он знает, что Col Skewed, зачем перераспределять его, когда доступны другие варианты

Это не значение NULL, которое искажает. Это постоянное значение флага (вероятно, значение rep. NULL, например -9999, которое вызывает перекос, как я упоминал на постере). Если вы перепишите Q, как я обновил, он работает нормально. Я предпочел NOT EXISTS, потому что последний не будет нуждаться в проверке NULL (как практика, хотя из моих знаний DD — я знаю, что оба col объявлены как NOT NULL) . Я обновил Плакат альтернативным кодом, который будет работать (хотя, как я объяснил, я завершил работу с версией НЕ существует)

Select     count(*) , f.SKEW_COL 
from      ( 
select  ff.SKEW_COL 
from      DB.10_BILLON_FACT ff 
where   ff.SKEW_COL not in ( 
select  d.DIM_COL 
from      DB.Smaller_DIM d  )) as  f
Group   by f.SKEW_COL

Могу ли я не заставить функцию переписывания запросов оптимизатора продумать вопрос Q и переписать с помощью приведенной выше логики. Вышеуказанное НЕ будет перераспределять, но ПРОСТО СОРТИРОВАТЬ по перекошенной колонке

Спасибо за ваш ответ!

Это редактирование:

SELECT 'SYSA', 
   t1.lieu_stkph_cd, 
   Sum (t1.mt_pnu_cpta_dev_rep), 
   Sum (t2.mt_util_dev_rep) 
FROM   (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           a.mt_pnu_cpta_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
    WHERE  c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           b.mt_util_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
           INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b 
                      ON a.dt_art = b.dt_art 
                      AND a.id_auto = b.id_auto 
    WHERE  c.pma_cd = '') AS t2 
             ON  T1.id_auto = t2.id_auto 
            and T1.dt_art = T2.dt_art and t1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP  BY 1, 
          2

Это результат этого запроса:

LIEU_STKPH_CD PNU Сумма UTILIZATION AMOUNT
1 200 € 250 €

Это не точно, я объясняю:

Db_ftg_srs_prod_v.v_autorisation_a связан с db_ftg_srs_prod_v.v_utilisation_a с
— ID_AUTO
— DT_ART
но у меня может быть 1 ID_AUTO для X UTILIZATION, поэтому с помощью этого запроса я умножу на X использование суммы PNU, что не правильно

Таблица авторизации

ID_AUTO сумма PNU
1 100 €

Таблица использования

ID_AUTO ID_UTLIZATION AMTUNT UTILIZATION
1 1 100 €
1 2 150 €

Поэтому я должен отделить эти значения:

Ожидаемый результат

LIEU_STKPH_CD PNU Сумма UTILIZATION AMOUNT
1 100 € 250 €

Есть ли у вас какие-либо идеи ?

Заранее спасибо
Christophe

3 ответа

Лучший ответ

Есть несколько способов избавиться от умноженных значений, например, агрегирование перед объединением

SELECT 'SYSA', 
   t1.lieu_stkph_cd, 
   t1.mt_pnu_cpta_dev_rep, 
   t2.mt_util_dev_rep 
FROM   (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           a.mt_pnu_cpta_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
    WHERE  c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd, 
           Sum(b.mt_util_dev_rep) AS mt_util_dev_rep 
    FROM   prod_v_ec_dossier_a_sysa c 
           INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                   ON a.id_doss = c.dosscta_no 
                      AND a.cd_prd_cpta = c.prct_no 
                      AND a.cd_entite_cpta = c.entite_cd 
           INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b 
                      ON a.dt_art = b.dt_art 
                      AND a.id_auto = b.id_auto 
    WHERE  c.pma_cd = ''
    GROUP  BY 1, 
          2          ) AS t2 
             ON  T1.id_auto = t2.id_auto 
            AND T1.dt_art = T2.dt_art AND t1.lieu_stkph_cd = t2.lieu_stkph_cd

Но кажется, что вам не нужно объединять две производные таблицы, это должно вернуть тот же результат:

SELECT 'SYSA', 
   t1.lieu_stkph_cd, 
   -- this value is multiplied by the number of rows 
   -- so simply divide by that number to revert the multiplication 
   Sum (a.mt_pnu_cpta_dev_rep) / Count(*), 
   Sum (b.mt_util_dev_rep) 
FROM prod_v_ec_dossier_a_sysa c 
JOIN db_ftg_srs_prod_v.v_autorisation_a a 
  ON a.id_doss = c.dosscta_no 
AND a.cd_prd_cpta = c.prct_no 
AND a.cd_entite_cpta = c.entite_cd 
JOIN db_ftg_srs_prod_v.v_utilisation_a b 
  ON a.dt_art = b.dt_art 
 AND a.id_auto = b.id_auto 
WHERE  c.pma_cd = ''
GROUP  BY 1, 
          2


0

dnoeth
2 Ноя 2017 в 16:51

Ваш первый запрос в порядке, потому что вы делаете

 SELECT ( select ... ) as field 1, 
        ( select ... ) as field 2, 

Но ваше второе, вы делаете кросс-соединение

 SELECT *
 FROM ( select ... ) as query1,
      ( select ... ) as query2

Это создает запрос с query1 x query2 строк

Вы хотите:

 SELECT query.*
 FROM ( SELECT ( select ... ) as field1,
               ( select ... ) as field2
        .....
      ) as query


0

Juan Carlos Oropeza
31 Окт 2017 в 15:45

Делая большие предположения о связи ваших данных. Вы должны присоединиться, используя фактическое предложение JOIN с ON, чтобы объяснить, как эти два подзапроса должны быть объединены. Что-то вроде:

SELECT 'SYSA', 
       t1.lieu_stkph_cd, 
       Sum (t1.mt_pnu_cpta_dev_rep), 
       Sum (t2.mt_util_dev_rep) 
FROM   (SELECT c.lieu_stkph_cd, 
               a.mt_pnu_cpta_dev_rep 
        FROM   prod_v_ec_dossier_a_sysa c 
               INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                       ON a.id_doss = c.dosscta_no 
                          AND a.cd_prd_cpta = c.prct_no 
                          AND a.cd_entite_cpta = c.entite_cd 
        WHERE  c.pma_cd = '') AS t1
   INNER JOIN (SELECT c.lieu_stkph_cd, 
               b.mt_util_dev_rep 
        FROM   prod_v_ec_dossier_a_sysa c 
               INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a 
                       ON a.id_doss = c.dosscta_no 
                          AND a.cd_prd_cpta = c.prct_no 
                          AND a.cd_entite_cpta = c.entite_cd 
               INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b 
                       ON a.dt_art = b.dt_art 
                          AND a.id_auto = b.id_auto 
        WHERE  c.pma_cd = '') AS t2 
    ON T1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP  BY 1, 
          2 


1

JNevill
31 Окт 2017 в 15:45

Tracking Consent

PDFs
Site Feedback
Help

Существует фрагмент кода продукта, который выполняет некоторую проверку строк. Это фактически перенесенный код, который вошел в teradata, и никто не удосужился изменить его, чтобы он был здравым смыслом TD, если я скажу. Этот код сейчас выкидывает

2646 : No More spool...

Ошибка, и это на самом деле не недостаток катушки, а из-за перекоса данных, что было бы очевидно любому мастеру Teradata.

Логика кода просто глупая, но они запускают ее в Prod. Изменение кода не является вариантом сейчас, потому что это производство. Я могу переписать его, используя Simple NOT Exists, и запрос будет работать нормально.

    EXPLAIN SELECT  ((COALESCE(FF.SKEW_COL,-99999)))  AS Cnt1,
        COUNT(*) AS Cnt 
 FROM   DB.10_BILLON_FACT FF 
WHERE   FF.SKEW_COL IN(
SELECT  F.SKEW_COL 
FROM    DB.10_BILLON_FACT F 

EXCEPT  
SELECT  D.DIM_COL 
FROM    DB.Smaller_DIM D



) 

Его сбой, потому что он хочет перераспределить на SKEW_COL. Что бы я ни делал, ЭТО НЕ ИЗМЕНИТСЯ. SKEW_COL перекошен на 99%.

вот объяснение.СБОЙ НА ШАГЕ № 4.1

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  3) We lock DB.F for read, and we lock DB.D for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.F by way of an
          all-rows scan with no residual conditions into Spool 6
          (all_amps), which is redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 6 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 6
          is estimated with low confidence to be 989,301 rows (
          28,689,729 bytes).  The estimated time for this step is 1
          minute and 36 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is built locally on the AMPs.  Then we do a
          SORT to order Spool 7 by the hash code of (
          DB.D.DIM_COL).  The size of Spool 7 is
          estimated with low confidence to be 6,118,545 rows (
          177,437,805 bytes).  The estimated time for this step is 0.11
          seconds.
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an exclusion
     merge join, with a join condition of ("Field_1 = Field_1").  The
     result goes into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     494,651 rows (14,344,879 bytes).  The estimated time for this step
     is 3.00 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps), which is
          redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 5 by row hash.  The size of Spool 5
          is estimated with low confidence to be 494,651 rows (
          12,366,275 bytes).  The estimated time for this step is 0.13
          seconds.
       2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
          all-rows scan with no residual conditions into Spool 8
          (all_amps) fanned out into 24 hash join partitions, which is
          built locally on the AMPs.  The size of Spool 8 is estimated
          with high confidence to be 2,603,284,805 rows (
          54,668,980,905 bytes).  The estimated time for this step is
          24.40 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 9 is estimated with low confidence to be 249,304,104 rows (
     5,235,386,184 bytes).  The estimated time for this step is 1.55
     seconds.
  8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 (Last Use) by way of an
     all-rows scan.  Spool 8 and Spool 9 are joined using a inclusion
     hash join of 24 partitions, with a join condition of (
     "SKEW_COL = SKEW_COL").  The
     result goes into Spool 4 (all_amps), which is built locally on the
     AMPs.  The size of Spool 4 is estimated with index join confidence
     to be 1,630,304,007 rows (37,496,992,161 bytes).  The estimated
     time for this step is 11.92 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     DB.FF.SKEW_COL).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 11.  The size
     of Spool 11 is estimated with low confidence to be 494,651 rows (
     14,344,879 bytes).  The estimated time for this step is 35.00
     seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 2 (group_amps), which is built locally
     on the AMPs.  The size of Spool 2 is estimated with low confidence
     to be 494,651 rows (16,323,483 bytes).  The estimated time for
     this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 52 seconds.

Есть около 900K уникальных значений для столбца skewed_ и * (интересно, что для DIM_COL есть 6 миллионов уникальных значений, поэтому я думаю, что он поворачивается к столбцу таблицы фактов. Но все же… он знает по значению Low Unique в большем таблица, это плохо перекосило) Мой вопрос после того, как я узнал, что SKEWED_COL перекошен на 99% из-за постоянного значения, например -9999 ПОЧЕМУ оптимизатор все еще перераспределяет по этому перекосу столбца вместо использования альтернативного подхода PRPD. Подобная (но не та же самая) ситуация случалась в прошлом, но когда мы перешли на более быструю коробку (больше AMPS), она ушла.

Все, что приходит на ум, заставит его изменить планы. Я пытался большинство диагностики — безрезультатно. Создан SI (На подобном VT, но он все еще будет искажен).SKEWING неизбежен, (Вы можете искусственно изменить данные — я знаю, чтобы минимизировать это, НО все, что НЕ после факта. Теперь мы в PROD. Все окончен) но даже после того, как он знает, что Col Skewed, зачем перераспределять его, когда доступны другие варианты

Это не значение NULL, которое искажает. Это постоянное значение флага (вероятно, значение rep. NULL, например -9999, которое вызывает перекос, как я упоминал на постере). Если вы перепишите Q, как я обновил, он работает нормально. Я предпочел NOT EXISTS, потому что последний не будет нуждаться в проверке NULL (как практика, хотя из моих знаний DD — я знаю, что оба col объявлены как NOT NULL) . Я обновил Плакат альтернативным кодом, который будет работать (хотя, как я объяснил, я завершил работу с версией НЕ существует)

Select     count(*) , f.SKEW_COL 
from      ( 
select  ff.SKEW_COL 
from      DB.10_BILLON_FACT ff 
where   ff.SKEW_COL not in ( 
select  d.DIM_COL 
from      DB.Smaller_DIM d  )) as  f
Group   by f.SKEW_COL

Могу ли я не заставить функцию переписывания запросов оптимизатора продумать вопрос Q и переписать с помощью приведенной выше логики. Вышеуказанное НЕ будет перераспределять, но ПРОСТО СОРТИРОВАТЬ по перекошенной колонке

Понравилась статья? Поделить с друзьями:
  • Ошибка 2560 на газели
  • Ошибка 2647 тойота вокси
  • Ошибка 2560 газель эвотек
  • Ошибка 2647 лексус нх 200
  • Ошибка 2560 газель некст