742

Business Intelligence Engineer interview questions shared by candidates

Top Interview Questions

Sort: Relevance|Popular|Date

Basic SQL questions. Describe a join to a non-technical person. How do you handle a query that does not perform quickly? They want to know that you can use 'explain plans', which I currently do not use (I'm still entry level). Select all customers who purchased at least two items on two separate days. Given a table with a combination of flight paths, how would you identify unique flights if you don't care which city is the destination or arrival location.

Customer problem: select customerId from orders group by customerId having count(distinct date(orderDate)) &gt; 1; -- Assuming the orderDate has time associated with it. Flights problem: select arrival, departure from flights union select departure, arrival from flights; Less

select distinct(b.id) from ( select a.id, a.d, a.#items, row_number() over (partition by id order by d) as rn from ( select id, d, sum(q) as #items from cust group by id, d having sum(q) &gt;=2 ) a)b where b.rn&gt;2 Less

CREATE TABLE test_flights ( origin VARCHAR(255), destination VARCHAR(255) ); INSERT INTO test_flights (origin, destination) VALUES ('Boston', 'Los Angeles'), ('Los Angeles', 'Boston'), ('New York', 'Pittsburgh'), ('Pittsburgh', 'New York') SELECT * FROM test_flights WHERE origin &lt; destination Less

probability of the product coming from location A is 0.8 and from location B is 0.6. What is the probability the customers will receive the product from location A or location B SQL - tested on different join , lead , lag, pivoting in sql , sub query, group by having, where, aggregate and think about how you would find outliers)

probability of the product coming from location A is 0.8 and from location, B is 0.6. What is the probability the customers will receive the product from location A or location B P(A)=0.8 P(B)=0.6 Assuming the events are independent: P(A OR B) = 1 - P(not A AND not B) = 1-(0.2*0.4) = 1-0.08 = 0.92 The other ways: P(A or B) = P(A) + P(B) - P(A AND B) = 0.8 + 0.6 - (0.8*0.6) = 1.4 - 0.48 = 0.92 OR P(A or B) = P(A) + P(B )*P(not A) = 0.8 + (0.6*0.2) = 0.8 + 0.12 = 0.92 OR P(A OR B) = P(B) + P(A)*P(not B) = 0.6 + (0.8*0.4) = 0.6 + 0.32= 0.92. Less

P(A)=0.8 P(A')=1-0.8= 0.2 P(B)= 0.6 P(B')= 0.40 P(A or B)= P(A)P(B')+P(B)P(A') It means from location A and not from B, OR, from location B and not from A Ans: 0.8*0.4+0.6*0.2 =&gt; 0.44 Less

Derive customer's account status as of month end for all the months in 2019. If for given month, there are more than one rows, pick the data from the latest date within the month. If for given month, there is no data, pick the data from latest date prior to the month You can use last_day function to get month ending date(Eg: last_day(01/01/2015) = 01/31/2015) customer_id event_date status credit_limit 1 1/1/2019 C 1000 1 1/5/2019 F 1000 1 3/10/2019 1000 1 3/10/2019 1000 1 8/27/2019 L 1000 2 1/1/2019 L 2000 2 1/5/2019 2500 2 3/10/2019 2500 3 1/1/2019 S 5000 3 1/5/2019 6000 3 3/10/2019 B 5000 4 3/10/2019 B 10000

select customer_id, last_day(a.event_date) month, coalesce( a.status,b.latest_status_ever) latest_status_per_month from table a join( -- latest if no record select customer_id, last_day(event_date) month, status as latest_status_ever from table QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_date desc) = 1 group by 1,2 )b on a.customer_id = b.customer_id and last_day(a.event_date) &gt; b.month QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id, month(event_date) ORDER BY event_date desc) = 1 group by 1,2 Less

Try this - Assumptions - The first record for a customer begins in the month it first had a status. Data is restricted to 2019. End month for all customers will be Dec 2019. All the status provided are valid status even including NULL status. with temp_data as ( SELECT 1 as customer_id, '1/1/2019'::date as event_date, 'C' as status, 1000 as credit_limit union all SELECT 1, '1/5/2019', 'F', 1000 union all SELECT 1, '3/10/2019','', 1000 union all SELECT 1, '3/10/2019','', 1000 union all SELECT 1, '8/27/2019', 'L', 1000 union all SELECT 2, '1/1/2019', 'L', 2000 union all SELECT 2, '1/5/2019', '', 2500 union all SELECT 2, '3/10/2019', '', 2500 union all SELECT 3, '1/1/2019', 'S', 5000 union all SELECT 3, '1/5/2019', '', 6000 union all SELECT 3, '3/10/2019', 'B', 5000 union all SELECT 4, '3/10/2019', 'B', 10000 ) ,tab2 as ( select customer_id ,last_day(event_date) as event_month ,status ,credit_limit ,dense_rank() over (partition by customer_id, event_month order by event_date desc ) as status_rank from temp_data group by customer_id,event_date,status,credit_limit ) ,cal as ( select cal_month_end_date from calendar_table where cal_year_id = 2019 group by 1 ) ,tab3 as ( select t.* ,event_month as start_month ,coalesce(lead(add_months(event_month,-1)) over(partition by customer_id order by event_month),year_end) as end_month from tab2 t ,(select max(cal_month_end_date) as year_end from cal) cal_end_month where status_rank = 1 ) select customer_id, cal_month_end_date, status, credit_limit from tab3, cal where cal_month_end_date between start_month and end_month group by 1,2,3,4 order by 1,2 Less

Assumptions - The first record for a customer begins in the month it first had a status. Data is restricted to 2019 End month for all customers will be Dec 2019 All Values in Status column is a valid status including NULLs with temp_data as ( SELECT 1 as customer_id, '1/1/2019'::date as event_date, 'C' as status, 1000 as credit_limit union all SELECT 1, '1/5/2019', 'F', 1000 union all SELECT 1, '3/10/2019','', 1000 union all SELECT 1, '3/10/2019','', 1000 union all SELECT 1, '8/27/2019', 'L', 1000 union all SELECT 2, '1/1/2019', 'L', 2000 union all SELECT 2, '1/5/2019', '', 2500 union all SELECT 2, '3/10/2019', '', 2500 union all SELECT 3, '1/1/2019', 'S', 5000 union all SELECT 3, '1/5/2019', '', 6000 union all SELECT 3, '3/10/2019', 'B', 5000 union all SELECT 4, '3/10/2019', 'B', 10000 ) ,tab2 as ( select customer_id ,last_day(event_date) as event_month ,status ,credit_limit ,dense_rank() over (partition by customer_id, event_month order by event_date desc ) as status_rank from temp_data group by customer_id,event_date,status,credit_limit ) ,cal as ( select cal_month_end_date from calendar_table where cal_year_id = 2019 group by 1 ) ,tab3 as ( select t.* ,event_month as start_month ,coalesce(lead(add_months(event_month,-1)) over(partition by customer_id order by event_month),year_end) as end_month from tab2 t ,(select max(cal_month_end_date) as year_end from cal) cal_end_month where status_rank = 1 ) select customer_id, cal_month_end_date, status, credit_limit from tab3, cal where cal_month_end_date between start_month and end_month group by 1,2,3,4 order by 1,2 Less

SQL question - Table1 year| month| order_id| seller_id| book| quantity| prices 2008|June| 1|888|HP| 2| 2000 2008|June| 1|888|LoTR| 1| 1000 2009|July| 2|999|HP| 1| 1000 Q1. find avg quantity of books solder for every order_id every year? Q2. find max units of books sold for every order_id

select distinct year, order_id, avg(quantity) over(partition by year, order_id order by year) as avg_books from table a group by 1, 2 Less

If the question is correct, we can just use group by command, No partition needed. Select orderid, avg(qty) from table group by year, orderid Less

q1. select distinct year, order_id, avg(quantity) over(partition by year, order_id order by year) as avg_books from table a group by 1, 2 q2. USE MAX(QUANTITY) OVER(PARTITIOB BY YEAR, ORDER_ID ORDER BY YEAR) Less

Schemas - Sales (sales_id, date , customer_id, Product_id, purchase_amount): Product (P_id, P_Name, Brand_id,B_name) Top 10 products in year XXXX Top 10 products in each year List of customers whose total purchase increased from XXXX-XXXX but decreased from XXXX-XXXX. List of customers who bought both brands "X" & "Y" and at-least 2 products in each brand.

elect t.customer_id "Customer ID" from ( select s.customer_id, count(distinct p.brand_id) over (partition by s.customer_id) brands_counter, count(distinct p.product_id) over (partition by s.customer_id, p.brand_id) products_counter from sales s inner join product p on p.product_id = s.product_id where p.brand_name in ('X', 'Y') ) t where t.brands_counter = 2 group by t.customer_id having min(t.products_counter) &gt;= 2 Less

WITH X_SAL AS ( SELECT customer_id, COUNT(DISTINCT Product_id) CNT FROM Sales Sal JOIN Product AS Prod ON Sal.Product_id = Prod.P_id WHERE B_name IN ('X') GROUP BY customer_id HAVING COUNT(DISTINCT Product_id) &gt;= 2 ) , Y_SAL AS ( SELECT customer_id, COUNT(DISTINCT Product_id) CNT FROM Sales Sal JOIN Product AS Prod ON Sal.Product_id = Prod.P_id WHERE B_name IN ('Y') GROUP BY customer_id HAVING COUNT(DISTINCT Product_id) &gt;= 2 ) SELECT customer_id FROM X_SAL JOIN Y_SAL ON Y_SAL.customer_id = X_SAL.customer_id ORDER BY customer_id Less

with t as ( select customer_id, Brand_id, COUNT(DISTINCT(Product_id)) AS C_PID from sales s join product p on s.product_id = p.p_id where Brand_id in ('X','Y') GROUP BY customer_id,BRAND_ID HAVING COUNT(DISTINCT(Product_id)) &gt;= 2 ) select customer_id group by customer_id having count(distinct BRAND_ID) = 2; Less

Q1) Find the number of unique days each employee worked Emp Id Task Id Start date End date 1 1 Monday Wednesday 1 2 Monday Tuesday 1 3 Friday Friday 2 1 Monday Friday 2 1 Tuesday Wednesday Hint: Calendar day table or date dimension table Calendar_day Calendar_day_of_week Calendar_year Calendar_month 1900/01/01 Wednesday (3) 1990 1 Q2) How many customers placed orders every month? Table 1: Customer Date customer_id order_id units country 2019/07/01 A 112 5 US 2019/07/02 A 211 4 US 2019/08/02 B 511 4 EU 2019/09/01 C 322 1 JP 2019/09/01 C 322 2 JP 2019/08/05 A 378 6 US 2019/09/10 A 456 7 US

with t as ( SELECT Emp_Id, Task_Id, c.date FROM ACTIVITY A join calender c on c.date between a.start_date and a.end_date ) select emp_id, count(distinct(date)) as c1 from t group by emp_id; ---------------------------------- with t as ( select customer_id, extract(month from date) month, order_id from Customer ) select customer_id from t group by customer_id having (count(distinct month)) = 12; Less

create table unq_days( emp_id number, task_id number, start_dy varchar2(20), end_dy varchar2(20) ); insert into unq_days values(2,1,'Tuesday','Wednesday'); insert into unq_days values(2,1,'Monday','Friday'); insert into unq_days values(1,3,'Friday','Friday'); insert into unq_days values(1,2,'Monday','Tuesday'); insert into unq_days values(1,1,'Monday','Wednesday'); with range as ( select (to_date('4/5/2020','MM/DD/YYYY') + level -1) dt, trim(to_char(to_date('4/5/2020','MM/DD/YYYY')+level-1,'Day')) dy from dual connect by level &lt;= 7 ) select emp_id, count(distinct p.dt) unq_days from ( select u.emp_id, u.task_id, r.dt as start_dt, m.dt as end_dt from unq_days u left join range r on u.Start_dy = r.dy left join range m on u.end_dy = m.dy ) join range p on p.dt between start_dt and end_dt group by emp_id; EMP_ID UNQ_DAYS 1 4 2 5 Less

create table cus_ord ( ord_dt date, cus_id varchar2(1), order_id number, units number, country varchar2(2) ); insert into cus_ord values(to_date('2019/07/01','YYYY/MM/DD'), 'A',112,5,'US'); insert into cus_ord values(to_date('2019/07/02','YYYY/MM/DD'), 'A',211,4,'US'); insert into cus_ord values(to_date('2019/08/02','YYYY/MM/DD'), 'B',511,4,'EU'); insert into cus_ord values(to_date('2019/09/01','YYYY/MM/DD'), 'C',322,1,'JP'); insert into cus_ord values(to_date('2019/09/01','YYYY/MM/DD'), 'C',322,2,'JP'); insert into cus_ord values(to_date('2019/08/05','YYYY/MM/DD'), 'A',378,6,'US'); insert into cus_ord values(to_date('2019/09/10','YYYY/MM/DD'), 'A',456,7,'US'); select * from cus_ord; with dts as ( select count(distinct to_char(ord_dt,'YYYYMM')) cnt from cus_ord )select T.*, (case when dts.cnt = T.cnt then 'Y' ELSE 'N' END) ORDERED_ALL_MONTHS from dts join ( select cus_id, country, count(distinct to_char(ord_dt,'YYYYMM')) cnt from cus_ord group by cus_id, country)T on 1=1; Customer A ordered all 3 months. Less

Monthly Revenue : company_name,month,revenue Write a query to pull the monthly revenue as columns instead of rows.

select date_part('year',invoice_date) yr, sum(case when date_part('Month',invoice_date)=1 then revenue else 0 end) as January, sum(case when date_part('Month',invoice_date)=2 then revenue else 0 end) February, sum(case when date_part('Month',invoice_date)=3 then revenue else 0 end) March, sum(case when date_part('Month',invoice_date)=4 then revenue else 0 end) April, sum(case when date_part('Month',invoice_date)=5 then revenue else 0 end) May, sum(case when date_part('Month',invoice_date)=6 then revenue else 0 end) June, sum(case when date_part('Month',invoice_date)=7 then revenue else 0 end) July, sum(case when date_part('Month',invoice_date)=8 then revenue else 0 end) August, sum(case when date_part('Month',invoice_date)=9 then revenue else 0 end) September, sum(case when date_part('Month',invoice_date)=10 then revenue else 0 end) October, sum(case when date_part('Month',invoice_date)=11 then revenue else 0 end) November, sum(case when date_part('Month',invoice_date)=12 then revenue else 0 end) December FROM revenue group by 1 order by 1; Less

select company_name, [January], [February], [March], [April], [May], [June], [July], [August], [September], [October], [November], [December] from Monthly_Revenue pivot( sum(revenue) for month in ([January], [February], [March], [April], [May], [June], [July], [August], [September], [October], [November], [December]) ) as PivotTable Less

Use PIVOT function

(ordr_date, cust_id, product_id, prc_per_unit, unit) ('2020-10-01', '1', 'AA', 10.5, 2), ('2020-11-01', '1', 'BB', 1.5, 3), ('2020-01-01', '1', 'CC', 3.5, 8), ('2020-01-01', '2', 'CC', 4.5, 3), ('2020-01-01', '2', 'BB', 1.5, 3); 1. List the top 20 customers who have the highest spending 2. Find the highest spent product for each customer

select * from (select cust_id,amount,dense_rank()over(partition by cust_id order by amount desc) rn from ( select cust_id,product_id,sum(prc_per_unit*unit)as amount from Test_order group by cust_id,product_id order by amount desc)) a where a.rn=1; Less

WITH CTE AS ( SELECT cust_id, SUM(prc_per_unit*unit) spt FROM table GROUP BY cust_id) SELECT cust_id FROM CTE ORDER BY spt DESC LIMIT 20; Less

WITH CTE AS ( SELECT cust_id, product_id, SUM(prc_per_unit*unit) spt, RANK() OVER(PARTITION BY cust_id, product_id ORDER BY SUM(prc_per_unit*unit) DESC) rnk FROM table GROUP BY cust_id, product_id) SELECT cust_id, product_id FROM CTE WHERE rnk = 1; Less

From SQL: 1. If there are 2 tables with m and n rows respectively, then what will be the minimum and maximum number of rows in all the joins. 2. Get the 3rd highest employee salary from employee salary table. 3. Given a product table, the new record gets appended in table with the new timestamp for the product ID. Find out the latest row for that particular product with or without analytical functions. 4. Given there are 2 tables of employee details and employee project details, which all employee are not yet assigned a project. For python: 1. Given a product data frame, the new record gets appended in table with the new timestamp for the product ID. Find out the latest data frame for that particular product with pandas.

SELECT productID,timestamp,productField1,productField2 FROM productTable GROUP BY productID ORDER BY max(timestamp); Less

This sql isn’t even remotely right lol

SQL: 1. Minimum = 0 and Maximum = m * n. 2. Select salary from Emp_salary order by salary desc limit 3,1; 3. Select * from Products where timestamp = (Select MAX(timestamp) from Products) ; 4. Select * from Employees where employee_id NOT IN (select employee.id from Projects) Less