Questions d'entretien

Entretien pour Data Scientist, Analytics

-

Meta

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

Répondre

Réponses aux questions d'entretien

25 réponse(s)

2

I've tested all these on a mock data set and none of them work! Does anyone have the correct solution? I'm stuck on this one..

QR le

2

Posts and comments in the same table looks weird. Here's my attempt (made easy with CASE) to exclude all the posts from the table and grouping/counting comments. SEL parent_id ,COUNT(*) as comment_count ( SEL * ,CASE WHEN perent_id IS NULL THEN 'Post' ELSE 'comment' END as post_or_comment FROM Submissions ) a WHERE post_or_comment = 'comment'

Alberto le

2

Here is the solution. You need a left self join that accounts for posts with zero comments. Select children , count(submission_id) from ( Select a.submission_id, count(b.submission_id) as children from Submissions a Left Join submissions b on On a.submission_id=b.parent_id Where a.parent_id is null Group by a.submission_id ) a Group by children

Utilisateur anonyme le

7

Can you explain why you would even need the self-join here? Can you not just group by parent_id and do the COUNT() on each group, since the parent_id values correspond to the post values when they're not null?

Utilisateur anonyme le

5

If you group by parent_id, you'll be leaving out all posts with zero comments.

Utilisateur anonyme le

3

select number_comments, count(submission_id) as number_posts from ( # more than zero comments select submission_id, count(post_id) as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =0 group by submission_id ) k1 group by number_comments union select number_comments, count(submission_id) as number_posts from ( # comments= 0 select submission_id, 0 as number_comments from ( select submission_id, case when parent_id is null 1 else 0 end as post, case when parent_id is not null parent_id else null end as post_id, body from Submissions )k where post =1 group by submission_id ) k1 group by number_comments

Utilisateur anonyme le

1

@ RLeung shouldn't you use left join? You are effectively losing all posts with zero comment.

anonymous le

0

select t.post_id, count(t.submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Utilisateur anonyme le

0

select comments_count, count(submission_id) as post_count from ( select submission_id, count( distinct parent_id) as comments_count from Table A group by submission_id )A group by comments_count

Utilisateur anonyme le

0

I think all of the Posts are missing Parent_ID. I am editing the code shared above. This will solve the duplicate problem select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id not in (select submission_id from submissions)

Utilisateur anonyme le

0

Not the shortest answer but I think much clearer than anything posted here. Also gives output table that could actually be fed directly into a histogram which was part of the question. SELECT CASE WHEN num_comments IS NULL THEN 0 ELSE num_comments END AS num_comments, COUNT(parent_post_id) AS cnt_posts FROM ( SELECT submission_id AS parent_post_id, comment_count.num_comments FROM Submissions WHERE parent_id IS NULL LEFT JOIN ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM Submissions WHERE parent_id IS NOT NULL GROUP BY 1 ) comment_count ON submission_id = comment_count.parent_id ) GROUP BY 1 ORDER BY 1

Cam le

0

select p.parent_id as posts, count(c.submission_id) as commentcount from submissions c inner join submissions p on c.parent_id = p.submission_id group by p.parent_id;

Harish le

0

select case when parent_id is not null then parent_id else sub_id end as post_id, sum(case when parent_id is not null then 1 else 0 end) as comment_count from submissions group by case when parent_id is not null then parent_id else sub_id end;

Priyanka Shukla le

0

Create table: create table submissions ( submission_id int null, body varchar(500) null, parent_id int null ); Insert records: (change your database name) INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (1, 'POST1', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C1', 1); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (2, 'POST2', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (3, 'POST3', null); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C2', 3); INSERT INTO employees.submissions (submission_id, body, parent_id) VALUES (null, 'C3', 3); Solution: SELECT a.submission_id AS post_id, a.body, sum(CASE WHEN t.parent_id > 0 THEN 1 ELSE IFNULL(t.parent_id,0) END) AS comment_id FROM submissions AS a LEFT JOIN (SELECT b.parent_id FROM submissions AS b) t ON a.submission_id = t.parent_id WHERE a.submission_id IS NOT NULL GROUP BY post_id; Results: 1 POST1 1 2 POST2 0 3 POST3 2

Complete solution le

0

CREATE TABLE users( sid INT , pid INT , body Varchar(255)); insert into users Values ( 2,null, "cover"), (1,2,"Ami is"),(3,2,"hi"),(4,2,"good pic"),(5,null ,"profil pic"),(6,5,"nice"); (select pid , COUNT(pid) as total from users where pid is not null group by pid)

Utilisateur anonyme le

0

create table subs( sub_id integer, parent_id integer ) insert into subs values(1,null); insert into subs values(2,null); insert into subs values(3,null); insert into subs values(4,null); commit; insert into subs values(5,1); insert into subs values(6,1); insert into subs values(7,1); insert into subs values(8,1); insert into subs values(9,2); insert into subs values(10,2); insert into subs values(11,3); insert into subs values(12,3); insert into subs values(12,4); commit; select * from subs select cc, count(sub_id) from ( select a.sub_id, count(b.sub_id) cc from subs a inner join subs b on(b.parent_id = a.sub_id) group by 1) group by 1

Utilisateur anonyme le

0

I found it easier to explain when I broke it out into named sub tables to handle the case when there are no comments on a post and you want the end result to be the histogram of the number of comments per post: with parent_comment_ct as ( SELECT parent_id, COUNT(parent_id) AS num_comments FROM submissions WHERE parent_id IS NOT NULL GROUP BY parent_id ), submission_comment_ct as ( SELECT su.submission_id AS parent_post_id, pcc.num_comments AS num_comments FROM submissions su LEFT JOIN parent_comment_ct pcc ON su.submission_id = pcc.parent_id WHERE su.parent_id IS NULL ) SELECT CASE WHEN scc.num_comments IS NULL THEN 0 ELSE scc.num_comments END AS num_comments, COUNT(scc.parent_post_id) AS cnt_posts FROM submission_comment_ct scc GROUP BY 1 ORDER BY 1

icarus le

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Utilisateur anonyme le

0

SELECT recommended_page FROM (SELECT f.user1_id as users, f.user2_id as freinds, l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user2_id = l.user_id WHERE f.user1_id = 1 UNION ALL SELECT f.user2_id as users,f.user1_id as friends,l.page_id as recommended_page FROM friendship f INNER JOIN likes l ON f.user1_id = l.user_id WHERE f.user2_id = 1) MINUS (SELECT page_id as recommended_page FROM likes WHERE user_id = 1);

Utilisateur anonyme le

0

select c.subID as SubmissionID, count(c.body)-1 as Counts_Comments from subm c LEFT JOIN subm b ON c.subID = b.pID where b.pID is null AND c.pID is NULL group by c.subID UNION ALL select a.pID as SubmissionID, count(a.body) as Counts_Comments from ( select *, case when pID IS NULL then 'P' Else 'C' END as P_O_C from subm)a where P_O_C = 'C' group by a.pID Order by SubmissionID;

Aslesha le

0

select a.user_name,b.user_name,page_liked from services_db.pages_liked a, services_db.user_friends b where 1=1 and a.user_name = b.friend_user and a.page_liked not in ( select page_liked from services_db.pages_liked c where 1=1 and c.user_name = b.user_name ) ;

GV le

0

select parent_id as post, count(parent_id) as num_of_comments from submissions group by parent_id union select submission_id as post, 0 as num_of_comments from submissions where parent.id=null

Utilisateur anonyme le

0

select k.post_id, count(submission_id) -1 from (select submission_id, case when parent_id is null then submission_id else parent_id end as post_id from submissions) t group by post_id

Utilisateur anonyme le

3

In my case this question was like: 'you have a table Submissions with the submission_id, the body, and the parent_id. Submissions can be posts, or comments to a post. In posts, parent_id is null, and in comments, the parent_id is the post the comment is commenting about. How would you go and make a histogram of number of posts per comment_count?' I think i solved it along the lines of: SELECT comment_counts.n_comments, count distinct(n_comments.submission_id) ( select s1.submission_id, COUNT DISTINCT(s2.parent_id) as n_comments OUTER join submissions on s1.submission_id = s2.parent_id group by submission_id) comment_counts GROUP BY comment_counts.n_comments

Utilisateur anonyme le

0

I think it is pretty straight forward. All the posts will have null parent_id. Considering the table schema to be something like this: CREATE TABLE submissions ( submission_id INT, body VARCHAR(500), parent_id INT ); SELECT DISTINCT nvl(parent_id::TEXT,'Post with no comments') AS post_id, COUNT(CASE WHEN parent_id IS NOT NULL THEN submission_id ELSE 0 END) AS number_of_comments_or_post FROM submissions GROUP BY 1; This will give results like this: post_id number_of_comments_or_post Post with no comments 8 1 10 7 11 13 8 19 9 25 7 So, the first row will give the number of posts with no comments which is 8 and remaining rows tell the number of comments per post. Is there a flaw in this?

ig le

Ajouter des réponses ou des commentaires

Pour commenter ceci, connectez-vous ou inscrivez-vous.