SQL server querying revision

  • Relational Model is the foundation of SQL. A relation has a head(columns ) and a body(rows) The relational model is based on 2 mathematical models.
    • Set Theory
      • A set is a collection M into a whole of definite, distinct objects m( the "elements" of M).
    • Predicate logic
      • A predicate is an expression that when attributed to some object, makes a proposition true or false or unknown (in TSQL ).
  • Logical Query processing:
    • The phases are  From ->where ->group by -> having-> select-> order by. The output of each phase is a virtual table which serves as the input of the next phase.
      • From: where the tables you want to query is specified and table operators(apply, Join) also.
      • Where: Filters rows based on the predicate specified. Only rows for which the predicate evaluates to true are returned.
      • Group By: defines a group for each distinct combination of values in the grouped elements from the input table. All expressions processed in subsequent phases must guarantee a single value per group.
      • Having: Filters based on a predicate but is only evaluated per group and filters groups as a whole. Only groups for which the predicate evaluates to true are returned
      • Select: If a query is a grouped query, each group is represented by a single row in the result. It has two steps
        • Evaluating the expressions in the select list and producing the result attributes. Aliasing is also carried out here. Aliases defined here aren’t visible to phases processed earlier and to items in the same select list.
        • using the Distinct clause to remove duplicates.
      • Order By: returns the result in presentation order according to the expressions that appear in the order by list. The output of this clause isn’t relational
Using the from and select Clauses
  • From Clause
    • Where you indicate the tables that you want to query
    • where you can apply table operators like joins to input tables.
  • Select Clause
    • Evaluates expressions that define the attributes in the query’s result, assigning them aliases if needed
    • Eliminates duplicate rows using the DISTINCT clause(if specified) .
Data Types and Built in functions
  • Data Types
    • They do the following to objects
      • constrain the data that is supported: It has a domain of supported values and will not allow values outside that domain.
      • Encapsulates behavior that operate on it: If data is stored in the wrong data type, it will need to be converted before it can be used. e.g. + is addition for Int, but concatenation for char.
      • Assign the size of elements: The bigger the type used, the more storage it uses and the more I/O is required to process queries that include it.
      • Exposes the data through operators.
    • For Fixed types, updates will not require the row to physically expand, requiring no data shifting as they use the storage for the indicated size. Hence they offer better update performance.Variable types on the other hand use the storage for what you enter+offset information. They let you save a lot of storage, and offer better read performance.
    • Choosing a data type for keys
      • When choosing keys, we either go with Intelligent keys(those generated by the application we interact with), or surrogate keys(attributes chosen solely for the purpose of being used as keys). To generate surrogate keys, the most common techniques are
        • Identity column property: Generates keys in an attribute(of a numeric type) with a scale of 0.
        • Sequence object: An independent object from which you can obtain sequence values. It is not tied to any column and can be generated before use.
        • Non-sequential GUID(Globally Unique ID): type is UniqueIdentifier. generated with function NEWID(). GUID are globally unique across space and time.
        • Sequential GUID: This is used to generate sequential GUIDs using the NewsequentialID(). Can only be used as a constraint on an attribute.
      • When choosing surrogate keys, the size matters for performance(read performance) reasons as larger attribute(data size) consume more storage. If indexes are built on the key, the storage requirement is increased.
    • Date and time functions
      • To get current date and time
        • Getdate(): TSQL only, Datetime attribute
        • current_timestamp: standard, datetime  attribute
        • Sysdatetime(): standard, datetime2
        • sysdatetimeoffset(): standard, datetimeoffset
      • To get the date and time component of a datetime attribute
        • cast the attribute as either date or time.
        • datepart() is used to extract a part of a datetime attribute and return it as an integer. e.g. datepart(month, current_timestamp). can also calculate the week number e.g. datepart(week, current_timestamp). The day(), month() and year() are abbreviations of datepart().
        • datename() is similar to datepart() but it returns a character string. Mostly used to extract the month from a datetime attribute.e.g. datename(month, current_timestamp)
      • To build a datetime attribute from components.
        • Datefromparts(): build a date from a numeric type.
        • datetime2fromparts()
        • datetimefromparts()
        • datetimeoffsetfromparts()
        • smalldatetimefromparts()
        • timefromparts()
      • To get the last day of a month, use EOMONTH().
      • To add to a datetime attribute use DateAdd(). You must select which of the components you want to add to. format is Dateadd(component, difference, datetime). e.g. dateadd(day, -100, current_timestamp).
      • To calculate the difference between 2 datetime attributes use datediff(). You must select which of the components you want to calculate the difference for e.g. datediff(day, current_timestamp, dateadd(week, -100, current_timestamp))
    • Character Functions
      • Concatenation: You can use either + or concat
        • +: If any of the inputs is null, the operator returns a null.
        • Concat(): If any of the inputs is null, it substitutes the null with an empty string.
      • Substring extraction and position
        • Substring(): To extract a substring from a string. format is substring(string, start_position, no_of_characters). If no_of_characters >Len(string), it just returns the chars available till the end of string(Len-Start_position).
        • Left() and Right(): To extract a specified number of characters from the left of right of a string. Format is left(string, No_of_Characters).
        • Charindex(): Returns the position of the first occurrence of a string provided as the first argument in the string provided as the second. format is charindex(string_to_search_for, String_to_be_searched).
        • PatIndex(): returns the position of the first occurrence of a pattern provided as the first argument in the string provided as the second. Use % for any string, _ for a character and [] to represent a range of characters. Format is PatIndex(pattern, String_to_be_searched). e.g. Patindex(‘%[0-9]%’, @String_to_be_searched).
      • String Length
        • LEN(): to measure the number of characters in a string. Ignore spaces at the end of a string.
        • Datalength(): To measure the length of a string in bytes. Doesn’t ignore spaces at the end
      • String Alteration
        • Replace(): Replace in a string provided as the first argument, all occurrences of the string provided as the second argument, with the string provided as the third argument. Format is replace(@String_to_be_used, @string_to_replace, @replacement_string)
        • Replicate(): Replicate an input string a requested number of times. Format is Replicate(@string, @number_of_times).
        • Stuff(): From the string input as the first argument, it starts at the position input as the second argument, removing the number of characters input as the third argument and replacing them with the string input as the 4th argument. Format is Stuff(@String, @Start_position, @Number_of_characters_to_remove, @replacement_string).
      • String Formatting
        • Upper(): returns the input string in upper case.
        • Lower(): returns the input string in lower case.
        • Ltrim(): returns the input string with the leading spaces removed.
        • RTrim(): returns the input string with the ending spaces removed.
        • Format(): Format an input string based on a format string. Can also format into a culture as the third parameter. E.g. format(1759, ‘00000000’) returns 00001759.
    • Case Expressions and related Function
      • Case returns a specified value depending on what is used as the input string. It is either simple or searched.
        • Simple when the Case simply compares an input expression to multiple when expressions and returns a result expression. Format is
                                             Case @input_expression
                                             When @value then @result_expression
                                             When @value2 then @result_expression2
                                             Else @result_expression3
                                             End as @Alias.
        • Searched when case uses predicates in the when clauses to determine the result expression to return.
                                             Case
                                             When @input_expression @operator(e.g. >) @value(e.g. 10) then @result_expression
                                             When @input_expression @operator(e.g. >) @value2(e.g. 10) then @result_expression2
                                             When @input_expression @operator(e.g. >) @value3(e.g. 10) then @result_expression3
                                             Else @result_expression4
                                             End as @Alias
      • Coalesce(): Standard. Accepts a list of expressions and returns the first that is not null. the type of the return expression is determined by the return expression. It also carried over the nullability of  the first element if used in a select…into statement.
      • NullIf(): Standard. Accepts 2 expressions and returns null if they’re equal. returns the first if they’re not.
      • IsNull(): Accepts 2 expressions and returns the first that is not null. the type of the return expression is determined by the first input. It set the attribute as not null when used in a select…into statement.
      • IIF(): returns one value if input predicate is true and another value otherwise. Format is IIF(@predicate, @true_result, @false_result)
      • Choose(): allows you provide a position and a list of expressions and returns the expression in the indicated position.Format is Choose(@postion, @expression1, @expression2, @expression3, @expression4)
Filtering Data with Predicates
  • T-SQL supports 3 query clauses that enable you filter data based on predicates.
    • Where
    • On
    • Having
  • Predicates, 3-valued logic and search arguments.
    • Predicates follow 2 Valued logic when an attribute is not nullable. An expression can either evaluate to true or false only.
    • Predicates follow 3 valued logic when an attribute is nullable. An expression can either evaluate to true, false or unknown(when comparing with a null mark). for a statement to capture all tuples in 3 valued logic, you must take care of null. You can do that using the following predicates
      • Is Null: returns true when the tested operand is null.
      • Is not null: returns true when the tested operand is not null.
    • Filtering queries have a performance side. with the right filters, the DBMS can choose to use indexes without doing a full table scan.
    • Search arguments
      • This refers to how predicates are arranged to maximize the efficiency of queries by making them use indexes.
      • Predicates that are Search arguments are in the format @attribute @operator @value OR @value @operator @attribute. e.g. col1 = 10.
      • Applying manipulation to the filtered column prevents the predicate from being a search argument e.g where coalesce(col1, ‘3’) = coalesce(@value, ‘3’) ensures that indexes defined on col 1 aren’t used. Instead. consider using where col1 = @value or (col1 is null and @value is null);
    • Manipulating predicates
      • Combining predicates: you can use AND and OR to combine predicates.
      • Negating predicates: negating true = False and vice versa. However negating null still yields null. To include Null in a result, use the is null operator.
      • When combining operators, there’s precedence. Highest is NOT the AND then OR. You can use the () operator to control evaluation.
    • Filtering Character Data
      • When filtering character data, ensure the literal in the filter is of the same data type as the attribute being filtered. Otherwise, SQL will have to carry out implicit conversion to get them to the same data type.
      • Like Predicate: this supports filtering string data base on pattern matching using wildcards. The wildcards are
        • %: Any string including an empty one
        • _: A single character
        • []: A character list. E.G. ‘[AC]%’
        • [-]: A range of characters. E.G. ‘[A-C]%’
        • [^]: A single character that is not in the list or range. E.G. ‘[^A-CG]%’
      • If you want to look for a character that is a wildcard, you indicate it after the wildcard as an escape character using the ESCAPE keyword. you also use the escape character(!) before the character e.g. Like ‘!_%’ Escape’!’ lets you search for _ in a string.
    • Filtering Date and Time Data
      • For date literals,
        • it’s best to express them in the language-neutral pattern YearMonthDay i.e. ‘20140319’. This is neutral for all date and time types. It ensures the date is the same even if a users language sets the dateFormat to something else.
        • You an also use the convert or parse to specify the format
          • Convert(): this lets you specify the style number as the conversion style
          • Parse(): this lets you specify a culture name
      • If only date is specified during an insert into a Datetime attribute, the time is assumed to be midnight.
      • When specifying the range, use the format where @date >= @firstdate and  @date < @lastdate+1. If the time value is specified, SQL server can round up and cause wrong data to be included in the result.
Sorting Data
  • A table represents a relation and as it is a set, no order is guaranteed for its elements. A query that doesn’t have an explicit instruction to return the rows in a particular order doesn’t guarantee the order of rows in the result.
  • Order BY clause
    • This is the only way to guarantee the rows are returned in an order.
    • The default sorting order is ascending. otherwise, specify desc.
    • Determinism: When ordering isn’t done by a unique attribute, the order of the query’s output isn’t deterministic. The order is deterministic when the order by list is unique. Deterministic queries are the only ones where the result set is repeatable. You can specify ordering by a unique attribute for the ordering to be deterministic. format is
                         Order by @Attribute1 desc, @attribute2
    • You can order by elements you’re not selecting as long as the rows would have been normally allowed there. This doesn’t apply when distinct is specified as the result rows don’t map to the source rows in a one to one manner, but in a one-many manner.A attribute can represent a number of tuples in this scenario.
    • It is executed after select so column aliases in the select clause are visible here.
    • Nulls are sorted before non-nulls in ascending order. to sort them after, use a coalesce to replace them with a cheat value.
    • Indexes help SQL server avoid actually sorting the data to address a order by request.
Filtering Data with TOP and OFFSET-FETCH
  • Top: Used to filter a requested number or percent of rows based on indicated ordering. Specified in select clause along with the requested number of rows in parentheses. The ordering is dependent on the order by clause. Format is
                    select top (@No_Of_Rows) @attribute1, @attribute2, @attribute3
                    select top (@percent_of_rows) percent @attribute 1, @attribute2, @attribute3
    • You can use a variable as the @no_of_rows.
    • To ensure the result is deterministic, you can either
      • Use the with ties option: This ensures that all ties are included. It usually results in more rows being included than were requested. Format is
                                   Select top(@No_of_rows) with ties @attribute1, @attribute2, @attribute3
      • Guarantee determinism by adding a tiebreaker: add an element to the order by that makes the result deterministic.
  • Offset-Fetch: A standard alternative to TOP. Can be used to skip a certain number of rows.
    • Appears right after the order by clause which must be present.
    • When offset-fetch is used, the order by clause plays an additional role of telling the offset-fetch which rows it needs to filter.
    • Offset
      • You must specify the offset clause which allows you select the number of rows you want to skip(or 0 if you don’t). Format is
                                   Offset @no_of_rows_to_skip rows
      • When used without Fetch, it skips @no_of_rows_to_skip and fetches all the remaining rows.
    • Fetch
      • The Fetch clause is optional and is used to set how many rows you want to filter. You must specify first or next before @number_of_rows_to_retrieve. Format is
                                   Offset @no_of_rows_to_skip rows fetch first @number_of_rows_to_retrieve rows only
    • You can use expressions as input for the @no_of_rows_to_* value.
Using Joins
  • Joins are used to query data from different tables and match the rows that need to be related. There are 3 types.
  • Cross Joins
    • Performs a cartesian product of two tables, multiplying between the tables and yielding a row for each combination of rows from both sides. If @RelationA has @X tuples and @RelationB has @Y tuples, then their cross join will have @X * @Y tuples.
    •  The format is
                         Select A.@Attribute1, B.@Attribute2
                         From @relationA as A
                              cross join @relationB as B
                         where A<= @Value
  • Inner Joins
    • Allows matching of rows from two tables based on a predicate(usually comparing a primary key to a foreign key on another table). Rows for which the predicate evaluates to false or unknown are discarded.
    • The join’s matching predicate is specified in the ON clause.
    • If an attribute of either table will be ambiguous, then aliasing must be used.
    • Filtering predicates can be specified in either the Where or On clauses . It is the equivalent of concatenating the predicates with an and operator.
    • Format is
                         Select A.@Attribute1, B.@Attribute2
                         From @relationA as A
                              inner join @relationB as B
                                   On A.@Attribute1 = B.@Attribute2
                         where A<= @Value
  • Outer Joins
    • Allows you preserve all rows from one or both sides of the join, irrespective of their being matching rows on the other side based on the ON predicate. e.g. Left outer join preserves the @first_relation, inserting nulls for the attributes that aren’t matched in the @second_relation.
    • The ON and Where clauses play different roles. 
      • ON: This plays a strictly matching role as it purely matches rows on the preserved side with the other side, inserting nulls into the attributes on the other side if it doesn’t find a match. This isn’t final with respect to the preserved side of the join.
      • Where: This carries out filtering. This is final with respect to the preserved side of the join.
    • The full outer join is an outer join that preserves both sides.
    • The Format is 
                         Select A.@Attribute1, B.@Attribute2
                         From @relationA as A
                              left outer join @relationB as B
                                   On A.@Attribute1 = B.@Attribute2
                         where A<= @Value
  • Multi Join Queries
    • A join takes place between 2 tables(physical or virtual) at a time from left to right. The result of one is used as the left input of the next.
    • You can use parentheses to separate joins into different logical phases. e.g.
                         Select A.@Attribute1, B.@Attribute2
                         From @relationC as C
                              Left outer join
                                   (
                                        @relationA as A
                                        @relationB as B
                                        inner join
                                        on A.@Attribute1 = B.@Attribute2
                                   )
                               On C.@Attribute1 = B.@Attribute2
Using Subqueries, Table Expressions and Apply
  • Nesting queries allows you refer to one query’s result from another, removing the need to store the result of one query in a variable before another query can access it.
  • Exists Predicate
    • This accepts a subquery as input and returns true when the subquery returns at least one row and false otherwise.
    • Usually used in co-related subqueries with where for filtering. Filters out rows where attributes from the @input_query do not exist in the result set of the subquery specified in the exists predicate.
    • format is

                         Select A.@Attribute1

                         from @Relation1 as A
                         where EXISTS in
                              (    select B.@Attribute1
                                   From @Relation2 as B
                                   where B.@Attribute1 <> @Value 
                                        and A.@Attribute1 = B.@Attribute1)
  • In Predicate
    • This accepts a subquery as input and returns true when the subquery returns at least one row and false otherwise.
    • Usually used in co-related subqueries with where for filtering. Filters out rows where the specified attribute from the @input_query do not exist in the result set of the subquery specified in the IN predicate.
    • Format is
                Select A.@Attribute1, B.@Attribute2
                from @Relation1 as A
                where A.@Attribute1 IN
                (    select B.@Attribute1
                     From @Relation2 as B
                     where B.@Attribute1 <> @Value 
                          and A.@Attribute1 = B.@Attribute1)
  • Subqueries can be categorized in terms of their containment and result
    • Containment:
      • Self-contained: Subquery that has no dependency on the outer query. They can be run independently of the outer query. E.g.
                                   Select  A.@Attribute1, A.@Attribute2
                                   From @relationA as A
                                   where A.@Attribute3 >= (select Min(@Attribute4) from @relationB);
      • Correlated: Subqueries where the inner query has a reference to a column from the table in the outer query. They cannot be run independently of the outer query.

                                   Select  A.@Attribute1, A.@Attribute2

                                   From @relationA as A
                                   where A.@Attribute3 >= 
                                       (select Min(@Attribute4) 
                                        from @relationB as B
                                        where B.@Attribute2 = A.@Attribute2 );
    • Result:
      • Scalar: Subqueries that return a single value and can be used where a single-valued expression is expected. If the subquery returns more that one value, the query fails, if it returns an empty set, it is converted to a Null  e.g.
                                   Select  A.@Attribute1, A.@Attribute2
                                   From @relationA as A
                                   where A.@Attribute3 >= (select Min(@Attribute4) from @relationB);                                   
      • Multi-valued: Subqueries that return multiple values in a single column. Can be used where a multi-valued result is expected(e.g. when using the IN predicate)

                                   Select A.@Attribute1, A.@Attribute2

                                   From @relationA as A
                                   where A.@Attribute1 not IN
                                        (select @Attribute4 from @relationB where @Attribute3=@Literal)

      • Table-valued: See Table Expressions for expatiation.
  • Table Expression: This are named queries. You write an inner query that returns a relational result set, name it and query it from an outer query. Their result must be relational. There are 4 types
    • Derived Tables: This are visible only to the statement that defines them. 
      • The derived table’s query is defined in parentheses in the from Clause of the outer query and the inner query is named.
      • Due to the all-at-once property of SQL, the name assigned a derived table isn’t visible to other elements in the same query processing phase as the derived table’s definition. To join multiple instances, you must duplicate the code.
      • Format is below where  @Inner_Query = select @Attribute4 from @relationB where @Attribute3=@Literal
                         Select A.@Attribute1, A.@Attribute2
                         From (@Inner_Query) as A                         
    • Common Table Expressions(CTEs): This are visible only to the statement that defines them.
      • As the CTE name is assigned in another query processing phase, the same CTE can be referenced several times.
      • Format is below where  @Inner_Query = select @Attribute4 from @relationB where @Attribute3=@Literal

                                   With @CTE_NAME

                                   As
                                   (
                                        @Inner_Query
                                   )
                                   Select @Attribute1, @Attribute2, @Attribute3
                                   From @CTE_NAME
      • To define multiple CTEs, use the format

                                   With @CTE_NAME1

                                   As
                                   (
                                        @Inner_Query
                                   ),
                                   @CTE_NAME@ 
                                   AS
                                   (
                                        @Inner_Query2
                                   )
      • CTEs have a recursive form. 
        • The body of the recursive query has 2 or more queries separated by a union all operator( for a union all, the attribute type for both sides has to be the same).
        • One of the Queries in the CTE serves as the anchor member and returns a valid relational result. It is invoked only once.
        • The other queries have a reference to the CTEname in their recursive member. This query is repeatedly invoked until it returns an empty set.  
        • In each iteration, the reference to the CTE name from the recursive member represents the previous result set. 
        • The reference to the CTE name from the outer query represents the unified results of the invocation of the anchor member and all invocations of the recursive member.
        • Format is, @Anchor_member = ‘select @Attribute1, @Attribute2 from @relation1 as @Alias1 where @Attribute5=@literal’ and @recursive member = ‘ select @Attribute1, @Attribute2 from @CTE_NAME1 Join @relation1 as @Alias2 on @Alias1.@Attribute2 = @Alias2.@Attribute2

                                             With @CTE_NAME1

                                             (
                                                  @Anchor_Member
                                                  Union ALL
                                                  @recursive_Member
                                             )
                                             @Outer_query

    • Views: These are database objects, hence they’re re-usable. They don’t accept parameters.
      • Queries can select from them as if they’re tables, but they’re not persisted in the database
      • Format for creation is below where  @Inner_Query = select @Attribute4 from @relationB where @Attribute3=@Literal 

                                   Create view @View_name

                                   As
                                   @inner_Query;
                                   GO 
      • Format for execution is

                                   Select @Attribute1, @Attribute2

                                   From @View_Name                  

    • Inline table-valued functions: These are database objects, hence they’re re-usable. 
      • They support input parameters.
      • Format for creation is below where  @Inner_Query = select @Attribute4 from @relationB where @Attribute3=@Literal

                                   Create Function @Function_Name (@Variable as Int) returns table

                                   As
                                   Return
                                   @inner_Query    
                                   GO
      • Format for execution is

                                   Select @Attribute1 

                                   From @Function_name(@variable);
  • Apply
    • The apply operator allows you apply a table expression given to it as the right input to each row from a table expression given to it as the left input.
    • The inner query in the right table expression can have a reference to an element from the left table. the right table expression is evaluated separately for each left low.
    • Sometimes used to replace cursors.
    • There are 2 types
      • Cross apply
        • The right table expression is applied to each row from the left input.
        • If the right table expression returns an empty set for a left row, the left row isn’t returned.
        • The Format is 

                                             Select A.@Attribute1, A.@Attribute2, C.@Attribute2

                                             From @RelationA as A
                                             Cross Apply ( 
                                                  Select B.@Attribute2, B.@Attribute3
                                                  From @RelationB as B
                                                  Where A.@Attribute3 = B.@Attribute4) as C

      • Outer apply
        • The right table expression is applied to each row from the left input
        • if the right table expression doesn’t return a set for a left row, the right expression returns null as a place holder. 
        • The format is 
                                             Select A.@Attribute1, A.@Attribute2, C.@Attribute2
                                             From @RelationA as A
                                             outer Apply ( 
                                                  Select B.@Attribute2, B.@Attribute3
                                                  From @RelationB as B
                                                  Where A.@Attribute3 = B.@Attribute4) as C
Set Operators
  • Set Operators operate on two result sets of queries, comparing rows between the results. Depending on the result of the comparison and the set operator used, the operator determines whether to return the row or not.
  • They have the following guidelines
    • As complete rows are matched between the result sets, the number of columns in the queries must be the same and the column types of corresponding columns need to be compatible(implicitly convertible).
    • They consider 2 nulls as equal
    • The individual queries must return a relation.
    • the column names of result columns are determined by the first query
    • They have an implied distinct. They do not return duplicate rows
  • The format for set operators is
               @Query1
               @Set_operator
               @Query2
  • They are
    • Intersect: It returns only rows common to both result sets.
    • Union: It unifies the results of 2 input queries. 
    • Except: It performs a set difference between the first result set and the second returning only rows present in the first but not in the second. The order of the input queries matters
    • Union all: A multiset operator. It unifies the result of set 2 input queries, while keeping duplicates( the implied distinct for set operators doesn’t apply).
      • Format is

                                   @Query1

                                   Union all
                                   @Query2

  • Their order of precedence is Intersect>Union/Intersect and Union = Intersect, however this can be controlled with parentheses
Data Analysis Operations
  • Data analysis functions are functions applied to a set of rows and return a single value.
  • They are either Group functions or window functions.
Grouped Queries
  • Grouping allows you to define groups in your data, and then perform data analysis computations(e.g. aggregate functions) per group.
  • Grouped queries include the group function or the group by clause.
    • Aggregate Functions: They arrange all rows in one group if no group by clause is specified and perform computation for the group.
      • Examoles include MIN, MAX, SUM, COUNT, AVG
      • You can work with distinct occurrences by specifying the DISTINCT option
        • Format is

                                                       select @Attribute1, count(distinct @Attribute2)

                                                       from @Relation1
                                                       Group by @Attribute3.
    • Group By clause: 
      • Allows grouping the rows based on a specified grouping set of expressions specified in the group by clause.
      • All elements with the same value in the grouping set’s attribute are grouped and subsequent computation give a value per group.
        • Count(): Specifying count(*) instead of count(@Attribute1) returns a smaller count as * causes the count to ignore nulls.
      • The grouping set can be made of multiple elements(Attributes).
      • All Attributes with Null are considered equal.
      • Having Clause
        • This uses a predicate that evaluates per group unlike WHERE that’s evaluated per row.
      • Format is 

                                    Select @Attribute1, @Attribute2, @Attribute3, @Attribute4

                                   From @Relation1
                                   Group By @Attribute2, @Attribute1

      • All clauses that operate after a group by clause must be guaranteed to return a single result  value per group. 
        • You can refer to elements from the group by list as they return one distinct value per group.
        • If you want to refer to Attributes from the underlying table that aren’t in the group by list, 
          • apply an aggregate function to them to show they’re guaranteed to return one value per group 
          • add the attribute to the group by clause.
          • Carry out the grouping in a CTE and join the CTE to the base table adding the elements during the join.
      • Multiple Grouping Sets
        • They allow you group data in more than one way.
        • They’re specified in the group by clause.
        • There are three ways to define them
          • Grouping Sets: 
            • Used to list all grouping sets you want to define in the query.
            • the empty (         ) grouping set defines one group with all rows for computation of total aggregates.
            • Format is 

                                                                 Select @Attribute1, @Attribute2, @AggregateFunction(@Attribute3) as @Attribute4

                                                                 From @Relation1
                                                                 Group by grouping sets
                                                                 (
                                                                 (@Attribute1, @Attribute2), –GroupingSet1
                                                                 (@Attribute2), –@GroupingSet2
                                                                 (     ) –@GroupingSet3
                                                                 );

          • Rollup:
            • Used when there’s a hierarchy formed by the input elements. where only a subset of the possible grouping sets is interesting.
            • When the elements form a hierarchy, you use rollup to avoid computing unnecessary aggregates.
            • E.G. rollup(@Attribute1, @Attribute2, @Attribute3) groups by
              • (@Attribute1, @Attribute2, @Attribute3)
              • (@Attribute1, @Attribute2)
              • (@Attribute1)
              • (                 )
          • Cube:
            • Accepts a list of expressions as inputs and defines all possible grouping sets that can be generated from the inputs including the empty grouping set.
            • E.G. Group by Cube(@Attribute1, @Attribute2) groups by
              • (@Attribute1, @Attribute2)
              • (@Attribute1)
              • (@Attribute2)
              • (                 )
        • Nulls are used are as place holders in rows where an attribute isn’t part of the grouping set. 
          • If all grouped attributes are defined as not null, you can identify the rows that are associated with a single grouping set, based on a unique combination of nulls and non-nulls in those columns. 
          • However, if the attributes allow nulls, you need to use either  grouping() or grouping_ID()to differentiate.
            • Grouping():Accepts a single elements as input and returns 0 when the element is part of the grouping set and 1 when it is not.
              • You can identify a grouping set by looking for 0s in the elements that are part of the grouping sets and 1 in the others.
              • Format is
Select @Attribute1,grouping(@Attribute1), @Attribute2, grouping(@Attribute2), @Attribute3, grouping(@Attribute3), @AggregateFunction(*)
from @Relation1
Group by rollup(@Attribute1, @Attribute2, @Attribute3);
            • Grouping_ID(): 
              • This accepts the list of grouped columns as inputs and returns an integer representing a bitmap. Calculate the bitmap to determine which of them(having 0) is part of the grouping set and which(having 1) is not part of the grouping set.
              • The result integer is the sum of the values representing elements that are not part of the grouping set because their bits are turned on
              • Format is

select @Attribute1, @Attribute2, @Attribute3, grouping_id(@Attribute1, @Attribute2, @Attribute3), @Aggregate_Function(*)

From @relation1
group by rollup(@Attribute1, @Attribute2, @Attribute3)
Pivoting and Unpivoting data: done using table operators.
  • Pivoting: A technique of grouping and aggregating data, transitioning it from a state of rows to a state of columns. 
    • You use the Pivot operator.
    • In all pivots, you need to specify
      • Grouping element(on rows): what do you want to see on the rows
      • Spreading element( on columns): what do you want to see on the columns
      • Aggregation element( Data): what do you want to see where columns and rows intersect.
    • It is recommended to use a CTE because the grouping element is determined by elimination(i.e. the elements in the table not specified in the pivot statement). If the base table is queried directly, other elements will be included as grouping elements. In the example below, grouping is done by the @grouping_column. 
    • The @spreading_column and @aggregation_column cannot be the results of an expression. They must be column names from the CTE. However, an expression can be used and aliased in the CTE.
    • Count(*) cannot be used as the aggregate function.if you want a count, define a dummy column in the CTE and reference that.
    • Only one aggregate function can be used.
    • The IN clause only accepts a list of static values. you can use dynamic SQL if you don’t know the values during query definition.
    • @select list = @Grouping_column, @distinct_spreading_value1, @Distinct_Spreading_Value2, @Distinct_Spreading_Value3
    • Format is 

                         With Pivotdata as 

                         (
                              Select @grouping_column, @spreading_column, @aggregation_column
                              from @Relation1
                         )
                         Select @Select_list     
                         from Pivotdata
                              Pivot (@Aggregate_function (@aggregation_column)
                                   For @spreading_column in (@distinct_Spreading_Value1, @Distinct_Spreading_Value2, @Distinct_Spreading_Value3))as @Alias;
  • UnPivoting: A technique for rotating the input data from a state of columns to a state of rows.
    • It takes a set of source columns and rotates them to multiple rows, generating 2 target columns: One to hold the source column names, another to hold the values.
    • You need to specify 3 elements.
      • The set of source columns you’re unpivoting
      • The name to assign to the target names column
      • The name to assign to the target values column
    • The unpivot operator filters out rows in the @value_column with NULLS as they represent inapplicable cases.
    • The types of all columns that you’re pivoting must be the same
    • The Format is
                         Select @column_list_alias, @name_columns_alias, @values_column_alias
                         From @Relation1
                         Unpivot(@values_column for @name_columns in (@source_Column1, @Source_column2, @Source_column3) as @Alias;
Window Functions
  • Window Functions enable you to perform data analysis computations. You define the set of rows per function and then return one result value per each underlying row and function. You define the set of rows for the function to work with using the over clause.
  • They use aggregate function which are applied to a window of rows defined by the over clause. 
  • Over Clause
    • Used to define a window of rows for aggregate functions. The window is defined in respect to the current row. 
    • an empty over clause Over() represents the underlying query’s result set.
    • You can use window partition clause to restrict the window.
    • Format is

                         Select @Attribute1, @Attribute2, @Attribute3, @Aggregate_function(@Attribute1) over(partition by @Attribute2)

                         From @Relation1;
  • Framing: 
    • Used to define ordering within the partition using a window order clause, and based on that order, confining a frame of rows(rows or range) between 2 window frame extents/delimiters.
    • You define the Windows frame extents using a window frame clause, which requires a window order clause to be present.
    • Window Frame extent/ delimiters define the range of elements in a result set as a result of an aggregate function.
    • Window frame units
      • Rows: 
        • Allows 3 Window Frame Extents/delimiters.
          • Unbounded preceding/following: the beginning or end of the partition
          • Current Row: representing the current row. assumed as default if none is specified.
          • @number Rows preceding/following: @number rows before or after the current respectively
          • Format is

Select @Attribute1, @Attribute2, @Attribute3,@Attribute4 @Aggregate_function(@Attribute2) over( partition by @attribute1 order by @Attribute1, @Attribute3 rows between @delimiter1 and @delimiter2) 

      • Range
        • Allows 2 Window Frame Extents/delimiters.
          • Unbounded preceding/following: the beginning or end of the partition
          • Current Row: representing the current row. assumed as default if none is specified.
Window Ranking
  • Allow you rank rows within a partition based on a specified ordering specified in the over clause.
  • There are 4 ranking functions
    • Row_number(): computes a unique sequential integer within the window partition based on the window ordering.
      • Isn’t deterministic if the ordering isn’t unique.
    • Rank(): Assigns the same ranking value to all rows with the same ordering value. If n of the ordering value have the same value, the next ordering value is N+@rank_count.
    • Dense_rank(): Assigns the same ranking value to all rows with the same ordering value.  Assigns the same ranking value to all rows with the same ordering value. However, it only ranks distinct values hence if n of the ordering value have the same value, the next ordering value is @rank_count + 1.
    • Ntile(@NumOfTiles): Allows you arrange rows within the partition in a requested number of equally sized tiles, base on the specified ordering.
      • Isn’t deterministic if the ordering isn’t unique.
  • Format is
               Select @Attribute1, @Attribute2, @Window_Ranking_function() over (order by @Attribute1 as @ordering_value partition by @Attribute2).
Window offset Functions
  • Return an element from a single row that is in a given offset from the current row in the window partition, or from the first or last row in the window frame
  • There are 4 of them
    • LAG & LEAD : 
      • relies on an offset with respect to the current row. Supports Window partition and ordering clauses, but not window frame clauses. For Lag, Returns an element from the row in the current partition that is a requested number of rows before the current row(based on window ordering). For Lead, Returns an element from the row in the current partition that is a requested number of rows after the current row(based on window ordering).
      • Allow you specify the offset and the value to return if the attribute is null.
      • Format is

                                   Select @Attribute1, @Attribute2, @Attribute3, Lag(@Attribute1, @offset, @value_If_Attribute_is_null)over (partition by @Attribute2 order by @Attribute3)

                                   From @Relation1
                                   

    • FIRST_VALUE & LAST_VALUE:
      • Returns the first/last value from the first or last rows in the window frame.
      • Supports Partitions, order and frame clauses.
Querying Full-Text Data

Full-Text search 
  • Full text search allows approximate searches. They require full-text indexes inside full-text catalogs. Full text indexes can be created over Char, Text,image, XML and varbinary(max) type attributes.
  • Full-Text search components
    • You can store documents in binary or XML columns and use full-text queries on them.
    • Word breakers and stemmers perform linguistic analysis on all full-text data. they’re language specific.
      • Word breakers identify individual words(tokens) which are inserted into full-text indexes in compressed format.
      • Stemmers generate inflectional forms of a word based on the rules of a language.
    • You can specify a different language for each full-text indexed column.
    • Stopwords: used to define words that shouldn’t be indexed because they’re irrelevant.
    • Full-text queries can also search for synonyms as defined in the thesaurus file. The following can be configured if thesaurus files
      • diacritics_sensitive: set to 0 if a language is accent insensitive and 1 if it is.
      • Expansion: to add expansion words( words to search for in addition e.g. student, scholar) for a word.
      • replacement: To define replacement words or terms for a specific word or term e.g. Windows 2008 instead of win2k8.
    • To search a document’s properties, you can specify a search property list and add properties to the list.Document properties have predefined GUID and integer IDs.
    • Full Text Catalogs and indexes
      • Full text indexes are stored in full-text catalogs which are virtual objects that contain full-text indexes. 
      • Statistical semantics are an option for full-text indexes
        • They create additional key phrase and document similarity indexes that are part of statistical semantic indexing. Full-text search uses the key phrases to identify and index documents that are similar.
        • They allow you to query the meaning of the documents
        • Allow you to create text-mining solutions.
        • You need to install the semanticsDB
Using the contains and freetext predicates
Contains and freetext are predicates for limiting the result set of a query using full-text indexes
  • Contains 
    • Allows you search for Words.
    • You can add custom weights to words you are searching for.
  • FreeText
    • Less specific that contains. 
    • Searches for the values that match the meaning of a phrase not just exact words. 
    • The engine performs word breaking of the search phrase, stems and identifies expansions and replacements for the words in the searched term with words from the thesaurus.
    • Format is 

                         Select…..From…..Where freetext(Ftcolumn, ‘Searchword1’).

Full-text search functions
  • Containstable() and freetexttable() returns 2 columns: Key and Rank. 
    • The key column is the unique key from the index used in the key index clause of the create fulltext index statement. Rank returns an value between 0 and 1000 which states how well a row matches your search criteria.the higher the rank, the higher the relevance.
    • Freetexttable supports a fewer number of parameters.
    • Format is 
                         CONTAINSTABLE ( table , { column_name | ( column_list ) | * } , ‘ <contains_search_condition> ‘ [ , LANGUAGE language_term] [ , top_n_by_rank ] )
    • Search conditions are the same as for a contains predicate.
  • Semantic search functions
    • There are 3 functions for this
      • SemanticKeyPhraseTable
        • returns a table with key phrases associated with the full-text indexed column from the column list.
        • The source_key parameter specifies the unique key from the index used in the key index clause of the create fulltext index.
        • Format is 

                                                            SEMANTICKEYPHRASETABLE ( table, { column | (column_list) | * } [ , source_key ] )

      • SemanticSimilarityDetailsTable
        • Returns a table with key phrases that are common across 2 documents.
        • Format is

                                        SemanticSimilarityDetailsTable(table, source_column, source_key,matched_column, matched_key

      • Semanticsimilaritytable
        • Returns a table with documents scored by semantic similarity to the searched document specified with the source_key parameter. 
        • Format is
                                             SEMANTICSIMILARITYTABLE( table, { column | (column_list) | * }, source_key )

Querying and Managing XML Data

Intro to XML
  • XML uses tags to name part of an XML document. The parts are called elements.
  • XML documents are ordered: The position of elements matters
  • XML is case-sensitive unicode text
  • Some characters in XML have special meanings as they are considered markup. to include them,
    •  you must escape them using ampersand(&) followed by a special code, followed by semicolon(;).
    • you can use the XML CDATA section, written as <![CDATA[@String_with_character_to_insert]]> which prevents the special characters in the string being parsed.
  • processing instructions are written in <?@processing_instruction?>
  • Comments are in the format <!–@Comment–>
  • XML is a document when it has a single root node and a fragment when it doesn’t.
               Document:
               Fragment:
  • Elements have attributes
    • Attributes have their own names and values. They can be presented in 2 ways
      • attribute-centric: Attributes values are enclosed in quotation marks. 
      • element-centric: Attribute are nested elements of the original elements. Use the keyword elements to ensure the output is element-centric.
  • Namespaces are used to distinguish between elements from different sources. 
    • they are declared in the root element of an XML document.
    • They can be aliased. then you prefix element names with the alias 
  • XML data should have a fixed schema(metadata) as the actual data is mixed with elements and attribute names. The most common  schema is XSD.
    • Validating an XML document is checking whether the document complies with a schema.
    • XMLSCHEMA directive: 
      • This is used to return the XSD schema of the document. It is returned before the actual XML data. It accepts a parameter that defines a target namespace.
Producing XML from Relational Data.
  • You can convert the result set to XML using the FOR XML clause of the select T-SQL statements. 
  • The FOR XML clause comes after the order by clause.
  • There are several modes
    • For XML raw
      • The output is similar to the tabular expression of the data. Every row from the returned rowsets converts to a single element named row, and the columns become the attributes of this element.
      • You must use the elements keyword for the output to have element-centric attributes
      • You can enhance the RAW mode by renaming the row element, adding a root element, including namespaces and making the XML returned element centric.
      • The XMLSchema directive can be used here to return the XML schema
    • For XML auto
      • Elements are nested if a previous element shares an attribute specified earlier. 
      • You must use the elements keyword for the output to have element-centric attributes
      • Due to the nesting, the order by clause must be selected carefully as well as the select clause so the nesting is properly done.
        • The ordering should follow a one-to-many relationship
      • The XMLSchema directive can be used here to return the XML schema
    • For XML Path
      • Allows you manually define the XML returned.
      • It uses standard XML XPath expressions to define the elements and attributes you are creating.
      • Column names and aliases serve as Xpath expressions defining the path to the element in the XML generated. 
        • Path is expressed in a hierarchical way; levels are delimited with slash /
        • Every column becomes an element. If you want attribute-centric XML, prefix the alias name with @.
      • To create XML with nested elements for child tables, you need subqueries in the select part of the query in the path mode. The subqueries have to return a scalar value in a select clause.
        •  You return a scalar value by returning XML from the subquery. Then the result is returned as a single scalar XML value. You format nested XML
          from the subquery with the FOR XML clause, like you format XML in an outer query. Additionally,you have to use the TYPE directive of the FOR XML clause to produce a value of the XML data type, and not XML as text, which cannot be consumed by the outer query.
  • The With Namespaces clause 
    • Used before the select clause to define namespaces and aliases in the returned XML.
    • A colon is used to separate the namespace from the element name.
  • The Root Keyword is used to specify a root element.
Shredding XML to tables
You can convert XML to tables. Their are 2 methods
  • OPENXML rowset function: this provides a rowset over in-memory XML documents by using the Document Object Model(DOM) presentation.
    • Useful when a documents will be shredded multiple times.
    • Before parsing the DOM, you prepare the DOM presentation of XML using sys.sp_xml_preparedocument. After shredding the document, you remove the DOM presentation using sys.sp_xml_removedocument.
    • OpenXML uses the following parameters.
      • An XML DOM document handle, returned by sp_xml_preparedocument. 
        • This is an integer.
      • An XPath expression to find the nodes you want to map to rows of a rowset returned. 
        • Specified as rowpattern and defines how XML nodes translate to rows. The path to a node is used as a pattern
        • nodes below the selected node define rows of the returned rowset.
      • A description of the rowset returned
      • Mapping between XML nodes and rowset columns.
        • Done using the WITH clause of the openXML function. 
          • In the With clause, you specify an existing table, which is used as a template for the rowset returned or you define a table syntax.
        • You can use the flag parameter of the openXML function to specify the mapping. 
          • 1 is for attribute-centric mapping
          • 2 is for element-centric mapping
          • 11 is for both mappings
Querying XML data with XQuery
XQuery allows you navigate XML, loop over nodes, shape the returned XML instance and much more.
  • Xquery is case-sensitive
  • Variable start with $ e.g. $i
  • Xquery returns sequences which can include atomic or complex values(XML nodes).
    • Any node(element, attribute, text, processing instruction) can be included in the sequence.
    • Sequences can be formatted to get well-formed XML. 
  • Query() is used to query XML.
    • What to return from a query can be specified with the following
      • query(‘*’) returns the complete XML instance.
      • data() is used to specify what to return when you query XML. 
        • Used with query(). e.g. @XML_Variable.query(‘data(@attributes_to_return)’)
        • It returns the data values not the attributes.
        • data(*) returns all the atomic data values in the xml.
        • data(root/@Attribute1/@Attribute2) is used to return the atomic data values in the path(root/@Attribute1/@Attribute2) specified.
  • Every identifier is a qualified name(Qname). A Qname consists of a local name and optionally a namespace prefix.
    • Standard namespaces are
      • xs: namespace for an XML schema
      • xsi: The XML schema instance namespace. Used to associate XML schemas with instance documents
      • xdt: namespace for data types.
      • fn: functions namespace
      • sqltypes: namespace for mapping for SQL data types
      • xml: default XML namespace.
    • You can use standard namespaces in queries without defining them. 
  • You define your own data types in the prolog(at the beginning of your XQuery) which is seperated from the query body with a semicolon. 
  • XQuery Data Types
    • Xquery data types are either 
      • node types: attribute, comment, element, namespace, text, processing-instruction and document
      • atomic types: xs:string, xs:boolean, xs:QName, xs:date, xs:time, xs:datetime, xs:float, xs:double, xs:decimal, xs:integer.
  • Xquery functions
    • Xquery functions are organized into categories.
  • Navigation
    • With Xquery, you can specify a path absolutely or relatively from the current node. XQuery takes care of the current position in the document.
    • Every path consists of a sequence of steps, listed from left to right.
      • Steps are separated with slashes
      • A step may consist of 3 parts
        • Axis: the direction of travel. there are 6 axes
          • Child::
            • children of the current node. default and can be skipped
          • descendant::
            • all descendants of the node. Direction is down
          • self::
            • the current node. direction is here
          • descendant or self::
            • Retrieves the context node and all its descendants. direction is here and then down
          • attribute::
            • Retrieves the specified attribute of the current node. Direction is right
          • parent::
            • Retrieves the parent of the current node. direction is up.
        • Node test: specifies the criterion for selecting nodes.
          • This follows the axis you specify. Can be as simple as a name test.
          • Specifying a name means you want nodes with that name.
          • You can use wildcards.
            • Asterisk(*) means you want any principal node with any name.
            • A principal node is the default node kind for an axis. It’s an attribute if the axis is attribute:: and an element for all others
          • You can also perform node type tests
            • They help you query nodes that are not principal nodes.
            • Types of node type tests 
              • comment(): select comment nodes
              • node(): returns any kind of node
              • processing-instruction(): retrieve a processing instruction node
              • text(): retrieve text nodes or nodes without tags
        • Predicate: narrows down the search
      • The format of a path is

                                   @Node_name/@Axis@Node_test[@predicate]

  • Predicates: There are 2 types
    • Numeric:
      • Select nodes by position. they’re included in brackets.
      • e.g. (/x/y)[1] means the first y child of elements x
    • Boolean:
      • Select all nodes for which the predicate evaluates to true.
      • Can work on sequences and atomic values
        • For sequences, if one atomic value in a sequence leads to a true exit of an expression, the whole expression is evaluated to true.
          • Done using general comparison operators(=, <, > ,!=)
        • For Atomic values,
          • use value comparison operators do not work on sequences
    • Xquery supports if…then…else expression.
      • Used to return one expression or another depending on the value of the expression in the if. Similar to case.
  • FLWOR
    • They’re a for each loop used to iterate through a sequence of nodes returned by an XPath expression.
    • Signify For..let..where..order by..return
      • For: used to bind iterator variables to input sequences. 
      • Let: Optional. let you assign a value to a variable for a specific iteration
      • Where: optional. lets you filter the iteration
      • order by: control the order in which the elements of the input sequence is processed. the order is based on atomic values.
        • Expressions passed here must be compatible with the gt operator.
      • return: evaluated once per iteration. Also used to format the result XML.
        • Use {} around expressions to extract the value of a the expression
The XML data type(using)
  • XML data type has 5 methods that accept XQuery as a parameter.
    • query(): use to query XML data.
    • value(): retrieving atomic values
      • It returns a scalar value. you must specify the position of the element in the sequence you are browsing.
      • value(@Xquery_Expression, @dataTypeToReturn)
    • exist(): checking existence of a node in an XML instance.
      • Commonly used in the where clause.
      • returns 1 where the node searched for exists in the XML instance
      • returns 0 where the node searched for doesn’t exist in the XML instance
      • returns null if the XML instance is NULL
    • modify(): modifying sections within the XML data
      • Due to the size of XML data, it’s more efficient to change a small portion of it.
      • Invoked in an update T-SQL statement
      • There are 3 keywords for data modification
        • insert
        • delete
        • replace value of
    • nodes(): shredding XML data into relational data. 
      • Usually faster than using the OPENXML function.
      • returns a result set that contains logical copies of the XML instance. 
      • it has to be invoked for every row in the table.
  • XML is a Large object(LOB) type. 
  • Using XML data type for Dynamic Schema
    • You can validate XML data against an XML schema collection. 
      • This is useful for a dynamic schema. and allows you support data of different schemas.
      • XML schemas allow you constrain the possible nodes, the data type of the nodes, the values of the nodes.
      • Validating XML against a schema result in a typed XML.
    • to prevent binding a product to the wrong schema, use a trigger or (a check constraint to check whether the namespace is the same as the product category name.
  • XML indexes
    • They allow index seek operation instead of a table scan on XML data.
    • The first index on an XML column is the primary XML index. It contains a shredded persisted representation of the XML values. For each XML value in the column, the index creates several rows of data. It can only be created on tables with a clustered primary keys.
    • there are 3 types of secondary XML indexes
      • Path: useful if queries specify path expressions. 
      • value: useful if queries are value-based and the path is not fully specified or it includes a wildcard
      • Property: useful for queries that retrieve one or more values from xml instances using value()
Creating tables and enforcing data integrity.
Tables are created in two ways
  • Select into: creates a table automatically using the output of a query for the basic table definitions
  • Create table statement: you explicitly define the components of the table
    • The definition of a table(table Schema) can be defined after the table’s creation using the ALTER TABLE command.
    • The database Schema  is a container that you can use to group database objects
      • A table can belong to only one schema.
      • Every table’s name has two parts @Database_schema.@table_name.
      • Schema’s cannot be nested in other schemas.
      • Schema’s are owned by users.
      • To move a table from one schema to another, use the format

                                   Alter schema @destination_Schema transfer  @table_name.

    • Naming tables and columns
      • The name of tables and columns must be valid SQL identifiers.There are 2 types of identifiers
        • Regular: 
        • Delimited: identified by []or "" surrounding the name.
      • As the maximum identifier length is 128 characters, make sure identifier names aren’t too long
    • Choosing column data types
      • Use the most efficient data type.
    • Null and default values.
      • You can specify whether a column allows null in its value or not by specifying null or not null after the column’s data type.
      • If you don’t want to allow nulls, but there is a possibility of the data being unknown at insertion time, you can create a default constraint to assign a default value

                                   @column_name @datatype  not Null default(‘@default_value’).

    • Identity property and sequence numbers
      • The identity property can be assigned to a column to automatically generate a sequence of number upon insert
        • Only one column in a table can have it.
        • You specify both seed and increment values for the numbers generated.
        • To insert into an identity column, use the format

                                             set identity_insert @table_name On;

    • Computed columns
      • This are columns that are computed based on expressions.
      • The value of a computed table cannot be non-deterministic e.g. from using getdate().
      • Format is

                                   @computed_table_name As @Attribute1 @Operator @attribute2

    • Table Compression
      • You can compress the data in a table to get more efficient storage in the enterprise versions of SQL server. Data can be compressed in
        • Row: SQL server applies a more compact storage format to each row of a table
        • Page: Include row-compression plus additional compression algorithms.
        • Format is

                                             Create table @Table_name

                                             (
                                                  Table_schema
                                             ) with (Data_compression = @CompressionType)

  • Altering a table: used to alter a table’s structure and are or remove table properties. 
    • To add a column, you use the format

                         Alter table @Tablename

                         Add @columnName @column_Definition

    • To alter a column, use the format

                         Alter table @tablename

                         Alter column @columnname @column_definition

    • Cannot be used to
      • Change a column name
      • add an identity property
      • remove an identity property
Enforcing data integrity
  • Tables in a database sometimes need to be validated to ensure
    • The range of values the column accepts
    • Whether the values should reference another column
    • Whether they should be unique
  • The validation are controlled using constraints.
    • Constraints are database objects and must have unique names. however their names should reflect their function(the table name they’re constraining).
    • Can be declared during the create table and alter table commands
    • With Check
      • This causes the constraint creation to fail if any data already in the column
    • Format is 
Create table @table_name
(
     @table_definition
     constraint @constraint_name @constraint_type(@column_name)
)
Alter table @table_name
Add constraint @constraint_name @constraint_type(@column_name)
Go
    • There are 5 types
      • Primary Key constraints
        • This are used to enforce uniqueness of values for every row in a column.
        • there can only one primary key per table.
        • They can only be created on columns that don’t allow nulls and have only unique values
        • SQL server enforces uniqueness using a unique index.
      • Unique constraint
        • To enforce uniqueness of rows in a table.
        • Only one null can exist in a column with a unique constraint.
        • A unique non-clustered index is created for enforcing uniqueness.
      • Foreign Key constraints
        • A column or combo of columns in one table used to look up data in another table. 
          • In the other table, the referenced column has a primary key or unique constraint applied.
          • The referencing column can have the attribute duplicated.
        • Used to enforce that values entered into the referencing table are from a referenced table
        • Very useful for joining.
          • Creating a non-clustered index on foreign key columns helps query performance.
        • Format is 

                                             Alter table @table_name with check

                                             Add constraint @constraint_name foreign key(@column_name) references @Referenced_table(@referenced_column) with check
      • Check Constraint
        • Used to limit the range of values in a column, beyond the limits of the data type.
        • Format

                                             Add constraint @constraint_name check(@column @comparison_operator @value)

      • Default Constraint
        • This supplies a default value during an insert if no other value is supplied.
        • Values don’t have to be supplied to columns with default during insertion
        • Format is 

                                             Add constraint @constraint_name default(@default_value)

Designing and creating views, inline function and synonyms
Views, inline functions and synonyms are ways of presenting a logical view of a table without exposing the physical base table directly.
Views and inline functions
  • They allow you present the contents of tables to user and present complex query logic so a user doesn’t have to remember them.
  • Views
    • Appear as table and can be used to store and re-use queries in the database.
    • Properties for a view can be 
      • With Encryption: this obfuscates the view text so the user can’t easily see the select text of the view
      • With Schemabinding: binds the view to the table schemas of the underlying tables. The underlying tables cannot be altered without dropping the view and the view cannot have it’s schema changed unless the view is dropped.
      • With View_metadata: returns the metadata of the view instead of the base tables.
      • With Check option: this ensures that if you modify the rows of a table with a view that has a where statement, it ensures you can’t update rows so they’ll not be filtered in by the where statement.
    • A view can only have one result set. Hence it can only have one select statement except a union is used.
    • Views must return a relational result set
    • Views cannot create a table or reference a temporary table.
    • You can alter a view using the alter view command. You must reissue the the view when you alter it.
    • Partitioned views
      • You can use views to partition large tables on one server, in one or more tables across different databases and across multiple servers.
      • A partitioned view uses a union across partitioned tables.
    • To carry out DML(insert, update, delete) with a view,
      •  you must reference exactly one table at a time, no matter how many tables the view references.
      • The view columns must not have expressions surrounding them e.g. aggregate functions.
      • You cannot modify a view column that is computed from a union, cross join, except, intersect.
      • you cannot modify a column whose result is from grouping or distinct.
      • you cannot modify a view with a( top or offset-fetch and a with check option.)
    • Format is
  • Inline Functions
    • They are parameterized views(views with parameters)
    • returns a rowset based on the select statement.
    • Properties of an inline function can be
      • With Encryption
      • With Schemabinding
    • Format is 

                         Create function @Function_name(@parameter1 @parameter1type, @parameter2 @parameter2type)

                         returns table    
                         As
                         return
                         (@select_statement)
Synonyms
  • This provide an abstraction layer for all schema-scoped database objects. They are names stored in a database that can be used as substitutes for other object names
  • You can refer to a synonym anywhere you’ll refer to a table except in DDL statements(e.g. alter)
  • they can refer to objects in a different database or a linked server.
  • Synonyms cannot refer to other synonyms.
  • Synonyms do not expose metadata of to users.
  • Synonyms have a late-binding behavior
    • The object referenced doesn’t have to exist and SQL server doesn’t test for its existence
    • This means you can use a single synonym for many different objects, creating the synonym for each object as you need it. or have it in different DBs refering to the same object.
  • Format is

               Create synonym @Synonym_name for @TableName

Inserting, Updating and deleting data
Inserting data
  • You can Specify the column names in the insert  statement
    • Lets you skip columns that have identity/default values defined.
      • To assign a value to a identity column, you need to issue the set identity_insert @Table_name on.
      • During an insert, If a value or default isn’t specified for a column,
        • SQL checks whether the column gets it’s value automatically e.g. identity or default constraint. If it doesn’t
        • SQL checks if the column allows nulls and inserts a null. If it doesn’t
        • An error is thrown
  • Data can be inserted into a table using
    • Insert values: Allows you insert rows into a table based on value expressions.
      • Allows you change the arrangement of values in the insert statement. 
      • Format is

                                   Insert into @Relation(@attribute1, @attribute2, @attribute3) 

                                   values(@Value11,@value12,@value13),(@value21,@value22,@value23)
    • Insert Select: Allows you insert the result set of a query into the specified target table.
      • Allows you change the arrangement of values in the insert statement.
      • Format is

                                   Insert into @Relation1(@attribute1, @attribute2, @attribute3) 

                                   Select @select_list
                                   From @Relation2
                                   where @predicateExpression
    • Insert Exec: Allows you insert the result set returned by dynamic SQL or a stored procedure into the specified table.
      • Allows you change the arrangement of values in the insert statement.
      • Format is

                                   Insert into @Relation1(@attribute1, @attribute2, @attribute3) 

                                   Exec @Stored_Procedure1

    • Select Into: Creates the target table based on the definition of the source and inserts the result rows form the query into the table.
      • Copies the data from the source.
      • Copies some aspects of the definition(e.g. column names, types, nullability and identity). If you don’t want them copied, you need to apply some modifications to the data. 
        • e.g @identityColumn +1 removes the identity property from the destination.
        • If you want to change the data type, use case() or convert().
        • Modifying a column makes it nullable. To prevent it from being nullable, use ISNULL() e.g. isnull(cast(@Table1 as @datatype),’@AlternateValue’)
      • Some table definition like indexes, permissions, constraints and triggers aren’t copied. If you want to include them, you need to script them from the source to the target.

Update data

  • The update statement is used to update existing rows in your tables.
  • Format is

               Update @target_table

               Set @Attribute1 = @Expression1, @Attribute2 = @Expression2, @Attribute3 =  @Expression3
               where @predicateExpression
  • Updates can be based on Joins.
    • Update rows in a table while referring to related rows in other tables for filtering and assignment
    • Format is

                         Update @Target_table_alias

                         Set @Attribute1 = @Expression1, @Attribute2= @Expression
                         From @Target_table as @target_table_alias
                         Inner Join @Source_table as @Source_table_alias
                         on @Source_table_alias.@Attribute2 = @target_table_alias.@Attribute1
                         where @predicateExpression
    • Nondeterministic Updates can happen when multiple source rows match a target row.
      • SQL server updates one of the rows.
  • To prevent non-deterministic updates, use the merge statement instead of updates based on joins. It fails if there a multiple source rows for a target row.
  • Update can be done through table expressions
    • Useful when you want to see the rows that will be modified before the actual update.
    • Define a Table expression based on the select, then modify using update
    • Useful when you want to modify rows with expressions not allowed in set clause(e.g. window functions)
    • Format is
                         @Table_Expression as @TE_Alias
                         update @TE_Alias
                         set @TE_Alias.Attribute1 = @Expression1, @TE_Alias_Attribute2 = @Expression2.
                         where @predicateExpression
  • Update Based on a variable
    •  Modify a row and collect the result. Instead of using an update and a select, use the short version
    • Format is

                         Declare @variable as @DataType

                         Update @relation1
                         Set @variable1 = @Attribute1+=@Expression
                         where @predicateExpression
Deleting Data
There are 2 statements you can use to delete rows from a table: 
  • Delete: Used to delete rows from a table. A predicate can be used to limit the rows deleted.
    • Fully logged and can take a long time to complete.
      • Causes the transaction log to increase in size.
    • Exclusively locks the table hence a delete of a lot of data can cause lock escalation.
      • To prevent this, split the delete into smaller chunks using TOP and use an infinite loop

                                   While 1=1

                                   Begin
                                        Delete Top(1000) from @relation1
                                        if @@rowcount<1000 break
                                        where @predicateExpression
                                   End

    • Delete based on a join
      • Enables you delete rows from one table based on information that you evaluate in related rows in other tables.
      • You can also use a subquery instead(with in or exists)
      • Format is

                                   Delete from @relation1_alias

                                   From @relation1 as @relation1_alias
                                   Inner join @relation2 as @relation2_alias
                                   On @relation1.@attribute1 = @relation2.@attribute1
                                   where @predicateExpression

    • Delete based on table expressions
      • Use a table expression to define the rows you want to delete, then issue a delete against the table expression. 
      • Format is

                                   With @TableExpression1

                                   Delete from @TableExpression1

    • Format is 

                         Delete From @relation1

                         where @predicateExpression

  • Truncate: Deletes all rows from the target table.
    • Doesn’t allow an optional predicate filter.
    • Format is

                         Truncate table @relation1

  • Truncate is minimally logged and is faster than delete.
  • Delete doesn’t reset the identity property of a column and truncate does
  • Delete works for a table with a column referenced by a foreign key as long as the row being deleted isn’t referenced. Truncate isn’t
  • Delete is allowed against a table involved in an indexed view. truncate isn’t
Data Modification
Using sequence and identity column property
  • Identity column property
    • Automatically assigns a value to the column upon insertion. 
      • You can’t put a value in the identity column during an insert except you enable identity_insert on the table.
        • Set identity_insert @Table on
    • You cannot update an identity column.
    • Only one column can have this property.
    • The data type must have a scale of 0(no decimal). Numeric/Decimal with a scale of 0 is allowed
    • You can define a seed(first value) and increment( value to increment previous supplied value by). default for both is 1.
    • To query the last identity value generated use
      • Scope_identity(): returns the last identity value generated in your session in the current scope
      • @@identity: returns the last identity value generated in your session regardless of scope.
      • Ident_current(@relation): accepts a table as input and returns the last identity value generated in that table regardless of session.
    • Delete doesn’t reset the identity property of a column and truncate resets the identity column to its seed.
    • Identity doesn’t guarantee uniqueness as you can turn on identity_insert and can reseed the value. To enforce uniqueness, you need a unique index(primary or unique constraint)
    • Identity doesn’t guarantee there’ll be no gaps between values.
      • If an insert fails, the identity value isn’t reset, so the value the column returns is lost.
    • Identity doesn’t support cycling. After you reach the maximum value in the data type, the next insertion will fail. you need to reseed the identity value
      • To reseed an identity value, use the DBCC CheckIdent command.
        • dbcc checkident(‘@tableName’, reseed, @NewSeedValue)
  • Sequence Object
    • An independent object in the database.
    • Can be used for keys that shouldn’t conflict across different tables.
    • The value is generated before use.
    • Supports cycling.
    • The data type must have a scale of 0(no decimal). Numeric/Decimal with a scale of 0 is allowed
      • Default data type is bigint
      • E.g create sequence @sequenceName as @datatype;
    • Has the following properties
      • Increment by: The increment value. Default is 1
      • MinValue: the minimum value to support. Default is the minimum value in the type.
      • MaxValue. The Maximum value to support. Default is the maximum value in the type.
      • Cycle|No cycle: defines whether to allow the sequence to cycle or not. default is no cycle
      • Start with: sequence start value. default is MinValue for an increasing sequence and MaxValue from a decreasing(increment is -1) sequence.
    • To request a new value from the sequence, use the Next Value For function.
      • The next value for is usually part of a default constraint to generate values automatically.
      • e.g. Add constraint @constraint name default(next value for @SequenceName)
    • You can change all properties of a sequence except it’s data type.
      • Use the Alter Sequence command. 
        • To change the current value, use the restart with command
        • E.g. Alter sequence @sequence1 restart with @restartValue.
    • Has a cache Option which controls how ofter the current sequence value is written to disk versus to memory. Data is written to memory until the defined number of values are assigned.
      • This gives a very large performance improvement
      • e.g. alter sequence @sequenceName cache @numberTillDiskWrite
    • Doesn’t guarantee there’ll be no gaps between values.
      • If an insert fails, the value isn’t reset, so the value the sequence returns is lost.

 Merging Data

  • The merge statement allows you merge data from a source table or expression to a target table.
    • Has the following clauses
      • Merge Into @Target_Table: Defines the target table for the operation.
      • Using @source_Table: Defines the source table for the operation. operates similarly to a from clause. 
        • You can define joins, refer to table expressions, apply, pivot, unpivot or use table functions. The output is a table result that is used as the source of the merge
        • you can use any table as the source.
      • On @merge_Predicate: specify a predicate that matches rows between the source and target and defines whether a source row is or isn’t matched by a target row.
      • When Matched [and @predicate] Then @action: Defines an action to take when a source row is matched by a target row. 
        • Only Update and Delete are allowed because since a target row exists, you can’t insert.
        • you can apply 2 when matched if you want different predicates to cause either update or delete.
      • When not matched [by target] [and @predicate] then @action: This defines what action to take when a source row is not matched by a target row. 
        • You can only use insert here as the row doesn’t exist.
      • When not matched by source [and @predicate] then @action: This defines an action to take when a target row exists, by isn’t matched by a source row.
        • Only Update and Delete are allowed because since a target row exists, you can’t insert.
    • At minimum, only one clause is required and it can be any of the when clauses.
    • It’s best to compare all the columns in the predicate of the when matched clause to ensure unnecessary updates aren’t carried out 
    • Format is
              
                    
The output Clause
  • The output clause for modification statements, which you can use to return information from modified rows for auditing and archiving.
  • Similar to Select
    • To select the previous values from the columns, use specify the modification before the column name. You can use for
      • Insert
        • Used to return information from the inserted rows.
        • Add an output clause after the insert and select the attributes you want to return.
        • Format

                                             insert into @Destination_Table(@Attribute1, @Attribute2, @Attribute3)

                         output inserted.@Attribute1, inserted.@Attribute2, inserted.@Attribute3
      • Delete
        • Used to return information from deleted rows in a delete statement. you need to prefix the columns with the keyword deleted.
        • Add an output clause after the delete and select the attributes you want to return.
        • Format is 

                                             delete from @Destination_Table

                         output deleted.@Attribute1, deleted.@Attribute2, deleted.@Attribute3
      • Update
        • Used to return information from modified rows
          • You can return both the old and new images of the modified rows
          • To return columns before the update, use the prefix deleted
          • To return columns after the update, use the prefix inserted
          • Format is
Update @Destination_table
set @Attribute1 = @Expression1
output

 deleted.@attribute1, inserted.@Attribute2

      • Merge
        • Merge can carry out inserts, updates and deletes.
          • You can refer to columns from both the target and source.
          • Use $action to determine the action the merge carried out.
            • $action returns a string("insert", update or delete") to signify the action that was taken.
          • You use coalesce(inserted.@attribute1, deleted.@attribute1) to return the applicable one
          • Format is
@Mergestatement
output $action as The_action, coalesce(inserted.@Attribute1, deleted.@Attribute1)
      • Composable DML
        • to capture output from a modification statement when you only want a subset of the output rows and not all of them.
        • To do this
          • You define a table based on a modification with an output clause.
          • You have an outer insert select against a target table, with the source table being the derived table. 
        • Format is
Insert into @ArchivingTable(@Attribute1, @Attribute2, @Attribute3)
select @Attribute1, @Attribute2, @Attribute3
from(@ModificationStatement
          output $action as the_action, inserted.@Attribute2, inserted.@Attribute3) as @Alias
where the_action = ‘Insert’
  • You can use an into to write to a table.
    • If you use into, the target cannot participate in a foreign key relationship and cannot have triggers on it. 
    • Add the into after the output clause.
  • Format is

                    insert into @Destination_Table(@Attribute1, @Attribute2, @Attribute3)

output inserted.@Attribute1, inserted.@Attribute2, inserted.@Attribute3
into @ArchiveTable(@Attribute1, @Attribute2, @Attribute3)
Select @Attribute1, @Attribute2, @Attribute3
from @SourceTable

 

Implementing transactions, Error handling and dynamic SQL
SQL maintains transactional control over all changes to database data. this helps ensure the integrity of database data will never be compromised by partially completed transactions, constraint violations, interference from other transactions and service interruptions
Understanding Transactions
  • A transaction is a logical unit of work: either all of it completes as a whole unit, or none of it does.
  • Controlling the result of transactions
    • Commit: When the work of a transaction has been approved by the user, SQL server completes the transaction by committing them
    • Rollback: If an unrecoverable error occurs or the user decides not to commit, then the transaction is rolled back.
  • Properties of a transactions
    • Atomic: Every transaction is an atomic unit of work, either all database changes in the transaction succeeds or none of the succeed.
      • SQL server treats every DML or DDL command individually and will not allow any command to only partially succeed.
      • If more than 1 command is present in a transaction, SQL server will not allow the entire transaction to be committed unless all commands are successful
        • This behavior can be controlled using error handling techniques.
    • Consistent: Every transaction whether successful or not, leaves the database in a consistent state as defined by all object and database constraints. If an inconsistent state results, SQL server will roll back the transaction to maintain a consistent state.
      • It ensures that all constraints in the database are enforced.
    • Isolation: Every transaction looks as though it occurs in isolation from other transactions in regards to database changes. The degree of isolation can vary based on isolation level.
      • It ensures that when a transaction makes multiple changes to the database, none of the objects being changed by that transaction are allowed to be changed by any other transaction. If another wants to change the data, it must wait for the first to conclude.
      • Maintained by locking and row-versioning
    • Durable: Every transaction endures through an interruption of service. when service is restored, all committed transactions are rolled forward( committed changes to the database are completed) and all uncommitted transactions are rolled back(Uncommitted changes are removed).
      • Maintained using the transaction log. 
      • Every database change is first written to the transaction log, with the original version of the data before being written to the database data. when a transaction is committed, the fact that it has been committed is written to the log.
  • Transaction commands
    • Begin transaction(Begin tran): marks the start of a transaction.
    • Commit transaction/work(commit): Used to commit a transaction
    • Rollback transaction/work(rollback): Used to roll-back a transaction
  • Transactions can be nested 
  • Transaction levels and states
    • @@Trancount: This can be queried to find the level of transaction
      • 0 means the code is not within a transaction
      • >0 means that there is an active transaction
      • >1 indicates the nesting level of nested transactions
      • Doesn’t report uncommittable transactions
    • XACT_STATE(): this can be queried to find the state of the transaction
      • 0 indicates there’s no active transaction
      • 1 indicates that there is an uncommitted transaction and it can be committed, but the nesting level is not reported.
      • -1 means there’s an uncommitted transaction, but it cannot be committed due to a prior fatal error.
      • Doesn’t report the transaction nesting level.
  • Transaction modes: There are 3 ways of working with transactions
    • Autocommit:
      • It is the default mode
      • Single data modification and DDL are executed in the context of a transaction that will be automatically committed when the statement succeeds or automatically rolled back if the statement fails.
      • No transactional command is issued.
      • Changes to the database are automatically handled statement by statement as transactions.
    • Implicit transactions:
      • When you issue DML, DDL or select, A transaction starts, increments @@trancount for the session, but doesn’t automatically commit or roll back the statement. You must issue a commit or rollback to finish the transaction.
      • Turned on with : SET IMPLICIT_TRANSACTIONS ON
      • Running a command sets @@Trancount to 1(you are 1 level deep in the tran). You must manually issue a commit or rollback to finish the trans.
        • Any locks taken by your command are held till you commit. if you don’t commit, you can leave locks open.
        • It is set per session.
        • You can rollback a transaction if you notice mistakes.
    • Explicit transactions
      • Begins when you issue the begin tran command to start a tran.
        • This causes the value of @@Trancount to be 1.
        • When used with implicit transactions, @@trancount is set to 2 when Begin tran is run.
          • This makes it a nested transaction.
      • Nested Transactions: This occcurs when explicit transactions are nested
        • The behavior of rollback and commit changes.
        • Every new Begin Tran increments @@trancount by 1. A commit decrements @@trancount by 1 while a rollback resets @@trancount to 0. Hence, only the commit that reduces @@trancount to 0 is relevant.
      • Marking a transaction: 
        • You can name an explicit transaction. 
          • The name must be a valid identifier
          • SQL only records tranName for the outermost tran
          • E.g. begin tran @TranName
        • Named transactions are used for placing a mark in the transaction log to which the database can be restored.
          • E.g. Begin tran @TranName with Mark
        • To restore to the mark, reference the mark when restoring the transaction log.
          • Use StopAtMark to reference the TranName.
          • Use StopBeforeMark to restre to just before the transaction.
          • E.g. 

                                                       Restore Database @DBName from disk = ‘@Backuppath\backup.bak’ with noRecovery
                                                       Restore Log @DBName from disk = ‘@Backuppath\backup.trn’ with StopAtMark = ‘TranName’

  • Transactions can span batches.
  • Transaction Options
    • Savepoints: Locations within transactions that you can use to roll back a selective subset of work.
      • Define a savepoint using the Save Transaction command.
      • Rollback must reference the savepoint. 
    • Cross-database transaction
Basic Locking
  • Locking is used to preserve the isolation of transactions.
  • There are 2 general modes
    • Shared: For sessions that read data
      • Shared locks are compatible with each other
        • Several sessions can issue shared locks on an object
    • Exclusive: For sessions that write data
      • Used to exclusive lock an object so no other transaction can change the data until the locking tran commits or rolls back.
      • In most isolation levels, most sessions cannot even read data from exclusively locked objects.
      • Exclusive locks aren’t compatible with other locks
        • No other tran can read or write to the object
      • Blocking: 
        • If a session has an exclusive lock on an object, it blocks any other session from using the object until the first session either commits or rolls back.
        • If a session has a shared lock on an object, It blocks any others attempting an exclusive lock until the first session is done.
          • Exclusive locks aren’t compatible wth shared locks
          • Transactions with READ COMMITTED isolation ensures shared locks are dropped as soon as a read is completed.
      • Deadlocking
        • A result of mutual blocking between 2 or more sessions. 
        • Occurs when sessions cannot be resolved by waiting for another tran to finish because of a cyclical relationship between several commands. 
        • SQL server aborts one of the trans and returns error 1205 to the client.
Transaction Isolation Levels
  • This is a method of controlling the isolation requirements for readers.
  • Can reduce or increase blocking level based on whether they require locks, how long locks are held
  • This is configured per session
    • A transaction can read other trans data or allows data to be changed by other trans while the current trans only reads
  • Read Committed:
    •  Default. 
    • All readers in the session will only read data changes that have been committed
    • Select will use shared locks and exclusive locks will block sessions with Read Commited.
    • TO read uncommitted data, queries need to use the (nolock) or (readuncommited) table hints.
  • Read Uncommitted
    • Allows readers to read uncommitted data.
    • Removes the shared locks taken by select statements so readers no longer block writers.
    • Dirty reads can occur.
      • the result of the select can include information that was rolled back by another tran.
  • Read Committed Snapshot(RCSI)
    • Default for Windows Azure SQL database.
    • A method of using Read Committed.
    • Uses row versioning in tempdb to store original versions of changed data. 
      • The versions are stored as long as needed to allow readers read data in its original state. hence shared locks aren’t needed on the underlying resource while reading committed data
    • Set at the DB level.
  • Repeatable Read
    • guarantees that data read in a tran can be re-read later in the tran.
    • Updates and deletes or rows already selected are prevented.
    • Shared locks are kept until the end of a tran.
    • Phantom reads can occur
      • New data can be added after the first read.
  • Snapshot
    • Uses row versioning in tempdb.
    • Configured at the DB level and set per tran.
      • A tran using snapshot isolation will be able to repeat any reads and will not see any phantom reads.
      • New rows may be added, but the tran will not see them.
    • Doesn’t require shared locks as it uses row-versioning.
  • Serializable
    • Strongest isolation level
    • set per session
    • Doesn’t allow any inserts that would satisft the select.
  • Format is

               Set Transaction Isolation level @TranIsolationLevel.

Error Handling
Detecting and raising Errors
  • When an error is encountered, An error condition is generated and SQL server takes an action.
  • If T-SQL code isn’t in a try/catch block , error messages will be passed through to the client and cannot be intercepted in T-SQL code. 
  • When SQL generates an error condition, the function @@ERROR will have a positive integer value indicating the error number.
  • Analyzing error messages
    • Error messages are usually in the form 

                         @ErrorNumber, @SeverityLevel, @State, @Line_Number

                         @Error Message
    • E.g.

                    

    • Error Number: 
      • SQL server error messages are numbered 1 to 49999
      • Custom error messages are 50001 and higher
      • 50000 is for a custom message without an error number.
    • Severity Level
      • There are 26 levels(0 -25)
      • Errors with severity  >=16 are logged to SQL server log and eventVwr
      • Errors with severity >= 19 can only be specified by members of sysadmin
      • errors with severity >=20 cause the connection to be terminated and any open transactions to be rolled back
      • errors with severity between(0 -10) are informational only
    • State: 
      • Maximum value of 127.
      • Used internally by Microsoft
    • Error message: 
      • 255 unicode characters long.
      • all stored in sys.messages.
      • You can add using sp_addmessage
  • You can raise your own errors using RaisError and Throw
    • Raiserror()
      • Message(messageID, string or string variable) along with severity and state are required. E.g. raiseerror(@ErrorMessage, 16, 0);
      • TO issue errors with severity >20, you need to include the WITH LOG option(you must be a sysadmin). 
        • The connection will be terminated
      • Use NOWAIT to send messages immediately to the client.
      • Doesn’t terminate the batch
        • Code after it will continue to run
      • Format is
                         
    • Throw command
      • Similar to Raiserror except for the following differences.
        • Parameters are error_number, message, state
          • Can be used without parameters in CATCH
        •  the error number doesn’t require a matching defined message in sys.messages.
        • Any parameter can be a variable
        • state can be a maximum of 255
        • The severity is always 16
        • It always terminates the batch except when in a try block.
      • Format is

                                   

  • Try_Convert and Try_Parse
    • Try_Convert: Attempts to cast a value as a target data type.
      •  if it succeeds, returns the value, and returns a null if it doesn’t succeed. 
      • E.g. Try_convert(@destination_data_Type, @ValueToConvert);
    • Try_parse: Attempts to convert data of a data type into a specified data type. 
      • If it succeeds, it returns the value, and returns a null if it doesn’t succeed.
      • E.g. try_parse(@ValueToConver as @DestinationDataType)
  • Handling Errors after detection:
    •  There are 2 error handling methods available
      • Unstructured
        • You handle errors as the happens by accessing @@Error.
          • Consists of testing individual statements for their error status immediately after the execute. 
            • When an error occurs, the error number is recorded in @@error 
            • If no error occurs, the value in @@error is 0.
          • Querying @@error causes it to reset to a new number as it reports the status of the command last executed.
            • copy the value into a variable and test the variable.
        • Using XACT_ABORT with transactions
          • Affects the entire batch
            • you can make an entire batch fail if any error occurs by beginning it with SET XACT_ABORT ON.
          • Set per session.
            • All other transactions are subject to it until it’s turned off
          • Causes a transaction to roll back based on any error with severity >10
          • you cannot trap the error or the error number.
          • None of the remaining code is executed
      • Structured
        • You designate a central location (the catch block) to handle error.
        • Done using TRY/CATCH
          • You wrap the code to test for errors in a try block
            • Every try block must be followed by a catch block where you handle its errors. they must be in the same batch
          • If an error is detected in a try block, control is passed to its catch
            • Remaining statements in the try aren’t executed
            • after the catch, control is passed to the first statement after the try_catch
          • No error message is sent to the client
            • Any error with a severity >-20  that do not close connection are handled by the catch.
            • to test for the state of a transaction, query XACT_STATE()
              • XACT_STATE values
                • 1: An open transaction that can either be committed or rolled back
                • 0: no open transaction. same as @@trancount = 0
                • -1: an open transaction that can not be committed
          • You cannot trap error that are due to compilation errors, syntax errors or non-existent objects.
          • Functions to report error details
            • Error_number(): returns the error number
            • Error_Message(): returns the error message
            • Error_Severity(): returns the severity of the error
            • Error_line(): returns the line number of the batch the error occurred
            • Error_procedure(): Returns the function, trigger of procedure executing when the error occurred.
            • Error_state(): Returns the state of the error.
          • Using raiserror in try must have a severity between(11 -19) to throw control to the catch.
          • In the catch, you can use 
            • Raiserror
            • throw with params
              • Causes execution in the catch to end.
            • Throw: raises the original error message and passes it to the client.
              • Causes execution in the catch to end.
            • XACT_ABORT
              • Transfers control to the Catch.
              • The transaction is uncommitable in the Catch.
                • XACT_State = -1
Dynamic SQL
The Technique of using T-SQL to generate and potentially execute other T-SQL. Your code dynamically constructs code that will dynamically be executed batch by batch.
  • Used in scenarios where SQL doesn’t let you specify a variable in a clause. e.g FROM clause, USE clause, column names, contents of list in IN and PIVOT
  • Execute Command is used to Execute Dynamic SQL
    • You can use string variables in the execute command
      • The string variables must be a single batch(no GO)
  • SQL Injection
    • Dynamic SQL puts your Database at risk of SQL injection as the user can simply put his code instead of yours into the application
    • To prevent it, parameterize the dynamic SQL generation and execution using sp_executesql.
      • Sp_executesql is an alternative to Exec.
      • It support parameters. The parameters must be unicode.
      • The format is

                                   

Designing and Implementing T-SQL routines
Stored Procedures
  • This are routines that reside in a DB and encapsulate code.
  • A Stored procedure is a single batch of code.
  • Called using Execute.
  • They can be passed data through input params and receive data using output params.
    • Params can be passed by position
    • Params can be passed by association.
    • Default parameters(parameters that are assigned default values) do not need to be entered.
  • Can encapsulate T-SQL code.
  • They cannot have the Use statement
  • Created using create Proc command
  • Declare a variable as output for it to be returned to the caller
    • output variables are always optional.
    • after declaring the parameter, use the keyword output(out) to make it an output parameter.
    • During execution, the output keyword must be used after setting the value of the output variable.
  • Set nocount on
    • This lets you prevents messages that show how many rows were returned from being sent to the client.
  • Return is used only if you want to return an output parameter. to return a result set, do not specify a return in the procedure definition.
  • Return and return codes
    • This is used to cause a procedure to exit. 
    • Can be used with conditional statements to control the flow of queries
    • There can be several returns in a SP.
    • Return alone causes the return of 0 for successful and a negative number for an error. 
  • You can call one stored procedure from inside another.
    • temporary tables of a stored procedure are visible down not up.
      • A SP it calls can see it’s contents
      • The calling SP cannot see the called SPs contents
    • Variables and parameters are invisible to called SPs.
  • To Declare, Format is
               create proc @SP_Name
               @variable1 as @variable1type, @variable2 as @variable2type=@DefaultValue, @variable3 as @variable3type output
               AS
               Begin
               Set Nocount on
               @Select_Statement_including_Variables
               [Return]
               END
  • To Execute, Format is

               Exec @SP_NAME @variable1 = @Literal1, @Variable2 = @Literal2, @variable3 =@variable3 output

Branching Logic
  • Branching is used to control the flow of your code. They are used t enable your code handle complex situations that require different actions based on inputs.
  • Statement blocks are statements inside a Begin.. END that allow branching statements execute more than one statement after evaluation of input expressions.
    • E.G.

                         

  • Statements used for branching logic
    • IF/ELSE
      • Allows you conditionally execute code.
      • If the expression after the if keyword is true, the statement after the if is executed. If it isn’t, the statement after the else is executed. 
        • You can have multiple else statements
    • While
      • Allows you create loops to execute a statement block as long as a condition continues to evaluate to true.
      • The condition must be modified so that it eventually leads to a false.
      • You can use the break statement to end a while statement and go to the next block of code.
      • You can use the continue statement to end a loop and cause control to start at the beginning of the statement block again
    • WaitFor
      • It causes the execution of statements to pause for a specified period of time
      • There are 3 options
        • WaitFor Delay: used to specify how much time to wait for
          • E.G. Waitfor delay ’00:00:00′
        • Waitfor Time: used to specify the time execution should continue
          • E.G. Waitfor time ’23:00:00′
        • WaitFor Recieve: used only with service broker
    • Goto
      • Causes your code to jump to a defined T-SQL label.
Triggers
  • Triggers are a kind of Stored procedure associated with DML events on a table or view. 
    • It cannot be explicitly executed
  • Triggers are fired when a DML event the trigger is associated with occurs. 
  • There are 2 types of trigger events
    • DML triggers: A TSQL batch associated with a table that is defined to respond to a particular DML event such as insert, update or delete, or a combination of those events.
      • A trigger executes only once for each DML statement, no matter how many rows were affected
      • The schema must be the same as that of the table or view it is associated with.
      • Triggers execute as part of the transaction associated with the event statement.
        •  Issuing a rollback in a trigger causes all changes to rollback. 
      • Exit with a return statement.
      • You can access the inserted and deleted tables.
        • They contain rows affected by the modification that caused the trigger to fire.
      • There are 2 types
        • After
          • Fires after the event it is associated with finishes 
          • Can only be defined on permanent tables
          • Checking if @@rowcount =0 in the first line of a trigger improves performance
          • The trigger executes after the DML statement has passed all constraints check
          • After triggers can be nested to a depth of 32.
          • Format is 

                                                       Create trigger @Trigger_name

                                                       On @Table_name
                                                       After @DML_event1, @DML_Event2, @DML_Event3
                                                       As
                                                       Begin
                                                       If @@rowcount = 0 return;
                                                       Set NOcount on
                                                       END
        • Instead of
          • Fires instead of the event it is associated with.
          • can be defined on temporary and permanent tables and views.
          • Executes a batch of code instead of the DML statement.
          • commonly used on views to take care of constraints that prevent direct actions from being taken(e.g. foreign constraint).

User defined functions(UDF)

  • Encapsulate reusable code and return a scalar value or table to the caller.
  • They accept parameters which are accessed in the function as variables.
  • they are embedded in SQL and they execute as part of a SQL command. 
    • They do not require execute command
  • They cannot perform DDL, or change data in permanent tables using DML statements
  • They can be nested
  • There are 5 options
    • Encryption: obfuscates the source code
    • Schemabinding: binds the scemas of all referenced objects
    • returns null on null input: any null parameters cause a scalar UDF to return Null without executing the body of the function.
    • Called on null input: default, Function will execute even if null is passed as a parameter
    • Execute as: execute under various contexts
  • There are 2 types
    • Scalar UDFs: return a single value to the caller
      • They can appear anywhere an expression that returns a single value can appear.
      • They are run for every row in the result set and need to be very efficient.
      • to create them, the Format is

                                   Create function @FunctionName

                                   (
                                        @param1 @param1DataType
                                        @param2 @param2DataType
                                   )
                                   returns int
                                   AS begin
                                   return @Param1 @operator @param2
                                   End
      • TO execute them , just call them from a capable clause(Where, select etc) e.g
select @Attribute1, @Attribute2, @functionName(@Attribute1, @Attribute2)
From @Relation1
    • Table-valued UDFs: return a table to the caller. 
      • They can appear anywhere a table is expected in the from clause.
      • there are 2 types
        • Inline: 
          • They contain a single select statement that returns a table.
          • Format for creation is 
                                   Create function @FunctionName
                                   (
                                        @param1 @param1DataType
                                        @param2 @param2DataType
                                   )
                                   returns table as return
                                   (
                                   @Select_list From @relation @whereIncludingParams
                                   )
          • To execute, simply select from the function 
            • e.g. Select @Attribute1, @Attribute2, @Attribute3 from @functionName(@param1, @param2).
        • Multistatement
          • You define the table to be returned as a table variable and insert data into the table variable.
          • Return only ends the function and isn’t used to send data to the caller.
          • Format for creation is
Create function @functionName
(
@param1 @param1DataType
@param2 @param2DataType
)
returns @tableVariable table
(
@Attribute1 @Attribute1DataType,
@Attribute2 @Attribute2DataType
)
As
Begin
     insert @TableVariable
        @Select_list From @relation @whereIncludingParams
     Return
End
GO
Analyzing Query performance
Query Optimization
  • The number of execution plans grows with query complexity
  • Query Execution phases
    • T-SQL query:
      •  this it the statement to execute.
      • Performed by the relational engine
        • This uses logical operators(e.g. Join)
    • Parsing: 
      • checks whether the query is syntactically correct. 
      • The output is a parse tree( a tree of logical operators)
      • Performed by the relational engine
    • Binding: carries out name resolution for the objects in the query, binding objects to logical operators. The output is an algebrized tree( a parse tree associated with objects)
      • Performed by the relational engine
    • Optimization: generation of candidate plans and selection of a plan: the output is the actual execution plan(a single tree with physical operators)
      • Performed by the relational engine
    • Execution: 
      • This executes the query, compiling the code to binary before execution.
      • The output is the result set.
      • SQL can cache an execution plan so it doesn’t have to do optimization again.
      • Performed by the storage engine
        • This uses physical operators(e.g. merge join operation)
  • The query optimizer is a cost-based optimizer.
    • It assigns a cost to each possible plan. A higher cost means a more complex plan which means a slower query. The optimizer tries to find the one with the lowest cost. 
      • The cost of a query is calculated by determining the algorithm used by a physical operator and the cardinality estimation(estimating the number of rows to be processed).By calculating the cost of all operators in a plan, it can calculate the cost of the whole plan.
        • The number of rows is gotten from optimizer statistics
  • Caching the selected execution plan in the plan cache can speed up the execution of the same query or of an equivalent query from the execution perspective. SQL server parameterizes the query so one plan can serve multiple similar queries
    • Use of Stored procedures lets SQL parameterize queries.
  • SQL caches the execution plan separately from the actual value(the execution context) allowing SQL server reuse the execution plan many times.
System Monitoring for query optimization
  • Monitoring has an impact on the system being monitored.
  • SQL server Extended events is a very lightweight performance monitoring system
    • allows you correlate data from SQL server with data from the OS and application.
    • Server as a container for all extended event objects
      • Events:
      • Targets
      • Actions
      • Predicates
      • Types
      • Maps
  • SQL trace: an internal server mechanism for capturing events.
    • The source for a trace event can be a batch or an event like a deadlock.
    • After an event, the trace gathers the event information
    • SQL server profiler is the UI for trace. 
      • When monitoring a Production system, use trace without profiler as profiler uses system resources.
Using SET session options and analyzing query plans
  • Set Session Options
    • Set Statistics IO command
      • One of the targets of optimizing a query is to lower disk I/O
      • Lowering the number of pages SQL server has to read.
      • A session command( stays on till you complete the session or disconnect from SQL Server)
      • Sample result is 
      • returns the following data
        • Scan count: the number of index or table scans performed
        • Logical reads: the number of pages read from the data cache.when you read a whole table, it shows you an estimate about table size.
        • Physical reads: the number of pages read from the disk. Lower than the number of pages because some pages are cached
        • Read-ahead reads: the number of pages SQL reads ahead
        • LOB logical reads: the number of LOB pages read from the data cache.
        • LOB physical reads: the number of LOB pages read from the disk
        • LOB read-ahead reads: the number of LOB objects read ahead
    • Set Statistics time command
      • this returns statistics of time including CPU time and total time needed for the operation. it shows time for execution and pre-execution phases.
  • SQL caches data in memory to prevent it having to go to disk to pull data.
  • Execution Plans
    • This shows how a query is executing. 
      • the physical operators used during the execution
      • the relative cost of each operator in the total query cost as a percentage of the total query cost.
      • Arrows showing how data flows from one physical operator to another.
      • The total and estimated cardinality of rows. 
        • Their difference shows errors in cardinality estimation
      • to see more details about an operator press F4.
    • the Execution plans(estimated and actual plans) are output as text, XML and graphically.
      • To see text
        • Set Showplan_text and Set Showplan_all for estimated plans
        • Set statistics profile for actual plans
      • To see XML
        • Set showplan_XML for estimated plans
        • set statistics XML for actual plans.
      • Graphically
        • Ctrl + L for estimated plans
        • Ctrl +M for actual plans
    • Some of the operators shown
      • Table scan: scan of a whole table stored as a heap(a table can be stored as a heap or a clustered index).
      • Clustered index scan: Scan of a whole table as a clustered index/ Indexes are stored as balanced trees
      • Clustered index seek: SQL server seeks for teh first value in the seek argument(e.g. a column value in where) in a clustered index and performs a partial scan
      • Index scan: scan of a whole non-clustered index
      • Index seek: SQL server seeks for the first value in the seek argument(e.g. a column value in where) in a non-clustered index and performs a partial scan.
      • RID lookup: Lookup for a single row in a table stored as a heap using the row identifier.
      • Key lookup: Lookup for a single row in a table stored as a clustered index using the key of the index.
      • Hash match join: a join that uses the hash algorithm.
      • merge join: a join that uses the merge algorithm.
      • Nested loops: Joins that use the nested loops algorithm
      • Stream Aggregate: Aggregation of ordered rows.
      • Hash Match Aggregate: Hash algorithm used for aggregating
      • Filter: a filer of rows based on a predicate
      • Sort: a sort of incoming rows.
Dynamic management objects
  • SQL monitors itself and gathers information useful for monitoring the health of an instance, find problems such as missing indexes and optimize queries. The items are stored in Dynamic management objects(function and views).
  • DMOs are in the sys schema and start with dm_.
  • DMOs are virtual objects that give you access to data SQL server stores in memory.
  • They store cumulative data which are reset after the restart of an instance.
  • Common DMOs for query tuning
    • SQL server OS(SQLOS) related DMOs: manages OS resources specific to SQL server
      • sys.dm_os_sys_info: returns info about the system.
      • sys.dm_os_waiting_tasks: info about sessions that are currently waiting on something. E.g. they’re waiting on a block.
    • Execution-related DMOs: Info into queries that have been executed, including their query text, execution plan, number of executions. 
      • The sql_handle column is a hash map of the batch text that is executed. you can use it to retrieve info from the sys.dm_exec_sql_text.
    • index-related DMOs: info about index usage and missing indexes.
Implementing indexes and statistics
Indexes
  • The logical structure of data
    • A page is an 8Kb unit and belongs to a single object. It is the smallest unit of reading and writing.
    • An extent is 8 consecutive pages
      • Mixed extent: If an extent consist of pages belonging to multiple object.
      • Uniform extent: if an extent consists of pages belonging to a single object. 
    • SQL server traces which pages and extents belong to an object through Index Allocation Map(IAM) pages. An IAM page can point to ~4GB of data.
      • Every table/index has a first IAM page.
      • IAM pages are organized as a doubly linked list(each page points to its descendant and antescedent.
      • SQL server stores a link to first IAM pages in its own tables
      • When a table is created, no pages are allocated. After the first insert, the first page and IAM are assigned.
      • Internal fragmentation means that pages are not full. The more rows you have on a page, the fewer pages SQL server must read to retrieve these rows. and the less memory used for cached pages for the same number of rows.
    • SQL server organizes tables as heaps or as balanced trees(a clustered index/table).
      • Heaps
        • Data stored in any logical order.
        • If you do not create a clustered index, then the table is a heap.
        • Data is found by scanning the whole heap in physical or allocation order. 
        • New rows are stored anywhere in a heap. this leads to their becoming fragmented.
      • Clustered index
        • A table is organized as a balance tree when you create a unique index.
        • a balanced tree has a single root page and one or more leaf pages. 
          • It can have zero or more intermediate pages
          • Data is stored in leaf pages.
          • Data is stored in logical order of the clustering key.
          • Intermediate pages point to lead-level pages. they contain a clustering key value and a pointer to a page where this value starts in logically ordered leaf level.
          • If a single page can point to all leaf-level pages, then only a root page is allocated. 
          • If more than one page needs to point to a lead-level page, an intermediate-level page is added.
          • The root page points to intermediate level pages. If the root-page cannot point to all first-level intermediate pages, a new intermediate level is created.
          • Pages on the same level are organized as a doubly linked list
            • SQL server can find the previous and next page in logical order for any specific page.
          • SQL server uses the IAM pages to track the physical allocation of the balanced tree pages.
        • Columns with unique or non-unique values can be the clustering key(key of a clustered index).
          • Uniqueness is maintained by adding a sequential integer to the clustering key’s value if it occurs again.
        • To seek for a row, SQL only needs to read 3 pages instead of the whole heap(root and intermediate pages).
        • Inserting a new row into a full page causes SQL server to split the page into 2 pages and move half of the rows to the second page as it needs to maintain the logical order of the row. 
          • This leads to internal fragmentation.
        • New pages can be reserved anywhere in a data file as physical order of pages needn’t correspond to the logical order.
          • This means the pages are out of order and are logically/ externally fragmented.
        • You can control internal fragmentation with the FillFactor option for leaf_level pages and Pad_index for the higher level pages of the create index statement. 
        • You can rebuild or reorganize an index to get rid of the external fragmentation using alter index…reorganize and alter index…rebuild.
        • A short clustering key means more rows can fit on pages above leaf level and fewer lower levels are needed. 
          • Fewer levels mean a more efficient index as SQL server needs to read fewer pages to find a row. Hence shorter keys are better for seeks
          • Good for OLTP environments
        • A large clustering key is useful where queries read huge amount of ordered data.
          • This is better for scans.
          • Good for warehousing environments
    • Non-clustered indexes
      • Indexes that do not contain all of the data and serve as pointers to table rows for quick seeks are called non-Clustered indexes.
      • Similar in structure to clustered indexes.
        • Root and intermediate levels look the same as a clustered index.
      • The leaf level is different because it doesn’t hold all the data.
        • What is stored depends on the underlying table organization(either balanced tree or heap).
        • It contains the index keys and row locators.
          • Row locators point to a row in the underlying table. 
            • If the table is a heap, the row locator is called a row identifier(RID) 
              • A 8-byte pointer containing DB file ID and Page Id for the target row, and the target row ID on that page
              • To seek for a row, SQL server needs to traverse the index to the leaf level, and read the appropriate page from the heap , retrieving the row from the page with a RID lookup.
                • RID lookups are efficient when your query searches for a small amount of rows only. 
              • If a row moves in a heap, all non-clustered indexes need to be updated to reflect their new physical location. However, SQL server is optimized to leave a forwarding pointer if a row has to move to another page.
            • If the table is a balanced tree, then the row locator is the clustering key.
              • To seek for a row, SQL server has to traverse all levels on a non-clustered index and also all levels of a clustered index in a key lookup
              • If a row moves , non-clustered indexes don’t need to be updated to reflect their new physical location as the row-locator pints to a logical location.
      • If the clustering key is narrow, SQL server can accommodate more rows on a leaf-level page than when RID(8-bytes) is used as a row locator.
      • You can have up to 999 non-clustered indexes on a table.
      • Filtered non-clustered indexes
        • This span a subset of column values only. and apply only to a subset of table rows. 
        • Useful when some values in a column occur rarely, whereas others occur frequently. You could just create a filtered index over the rare values only
        • SQL server only updates this index for a change in rare values.
        • Can be used to maintain uniqueness of values for a row with values and lots of nulls.
        • Created by adding a where to a create index statement. 
      • Columnstore Indexes
        • a method of storing index data column by column.
        • You need to create a column store index on the table.
        • It is stored compressed. when a query references a single column that is part of a columnstore index, SQL server fetches only that column from disk, not the entire row like in row storage.
        • However, to return rows, they must be reconstructed
        • Useful for queries that aren’t very selective. especially in data warehouse queries.
        • Tables with columnstore indexes are read only because rows must be reconstructed. To update a table, you must drop the columnstore index.
        • Divided into segments which are stored as LOB
          • it is the unit of transfer from disk to memory
          • Segments consist of multiple pages
          • Has metadata that stores the min and max value for each column for that segment. Only segments requested by a query are loaded into memory.
    • Indexed views
      • An index view is a materialized view. 
      • The view must be created with schemabinding
      • You must use count_big() aggregate function
    • The clustering key’s uniqueness is maintained using a uniquifier(which is added to repeating values). this ensures lookups don’t return wrong values.
    • If you update the clustering key or change the table from a balanced tree to an heap, all non-clustered indexes must be rebuilt.
Using search arguments
  • This is a way of writing predicates to enable SQL server use indexes.
    • Indexes are used only when the arguments in the predicate are searchable. The column name is alone on one side of the expression and the calculated value appears on the other side.
    • A column that has an index on it appears in the phrase alone, not as a function’s parameter. 
  • Efficient queries use the WHERE clause to filter rows.
    • The WHERE clause benefits from indexes. 
    • Where doesn’t guarantee that indexes will be used.
    • The clause has to be supported by an appropriate index and it must be selective enough. 
      • If it’s not very selective, it is less expensive to do a table/clustered index scan than to do a non-clustered index seek and then RID or key lookups. 
  • Join can benefit from indexes as well.
  • For queries that aggregate data and use group by, you should support group by with a index. 
    • Aggregating data is done wither with a hash or a stream physical aggregate operator.
      • Stream is faster, however, it needs a sorted input.  
  • Index usage can be tracked using sys.dm_db_index_usage_stats.
  • If SQL server finds all data in nonclustered indexes, then the query is covered by the nonclustered indexes and the indexes are covering indexes.
    • covering more queries by adding more columns causes the key to be longer and the index less efficient. 
      • You can include a column in a nonclustered index on a leaf level only not as part of the key.
      • You do this by using the include clause of the create index statement.
Statistics
  • SQL server maintains statistics of the distribution of key values in system statistical pages.
    • SQL optimizer uses this pages for cardinality estimation.
  • Auto-created statistics
    • SQL server maintains statistics automatically by default.
    • Statistics are created for indexes, and for columns used as search arguments.
    • There are 3 database options that influence auto creation of statistics
      • Auto_create_statistics: When On, SQL server creates statistics automatically.
        • On by default
      • Auto_update_statistics: When on, SQL server auto update statistics when there are enough changes in the underlying tables and indexes. SQL server updates out of date statistics during optimization. On by default.
      • Auto-update_statistics_async: determines whether SQL server uses synchronous or asynchronous statistics update during query optimization. If statistics are update asynchronously, SQL server cannot use them for the optimization of the query that triggered the update and doesn’t wait for the statistics to be updated. 
    • Statistics are stored in a statistics binary LOB and is created on one or more column. It contains a histogram with the distribution of values in the first column. 
  • Manual maintenance of statistics is done using the 
    • Create statistics command
    • Drop statistics command
    • update statistics command
    • sys.sp_updatestats: to update stats for all tables in a DB
    • Useful after a bulk import to maintain statistics, or after an upgrade of the DBMS.
  • To see the histogram of a statistics, use the DBCC show_statistics command
    • e.g. dbcc show_statistics(N’@TableName’, N’@Index_Name’) with histogram

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s