2011 Search Computing Course

Course Description

Topics

This course deals with the new technologies and applications that characterize the Web, seen as a large information system; the course can be seen as one of the many possible continuations of Database 2, which builds upon the foundations of information retrieval - a subject which is not covered in the basic data management courses - and then turns to web information retrieval, the key technology of search engines.
Information retrieval will cover classical aspects: text processing, index structures, classic data retrieval methods, retrieval evaluation.
Web information retrieval will include: search engine evolution, modeling the Web, Web crawling, link analysis, the PageRank and Hits methods.
Then the course touches several monographic topics: semantic, multimedia, and social Web, human computing, natural language processing for IR, technologies for massive web data management (hadoop) and for stream data management. Applied lessons will be dedicated to Sparql and HTML5. Most monographic lectures will be through invited guest lecturers.

Format

The format of the course is atypical and experimental. Students will be asked to participate to small projects (for 3 credits). For this reason, the first two lectures of the course will be dedicated to the Search Computing research project, which will inspire most of the proposed students' projects. These will be presented during the third lesson of the course and then monitored through follow-up and feedback, using the format which has proven to be successful in Alta Scuola Politecnica.
Project selection will be performed by students, who after project presentations will indicate their preferences; matching will attempt to satisfy the majority of students while at the same time allocating students to the majority of projects.
In addition, students will be asked to perform readings out of a reading list of numerous papers, and then to present (mostly orally, in some cases in written form) their personal interpretation of the reading; presentations or papers will contribute to the evaluation for the residual 2 credits. The reading list will also be provided at the beginning of class. A maximum of 40 students will be evaluated with this method, determined on a FIFO basis.
Should some students be excluded by this constraint, they will be graded through a conventional written exam followed by an oral discussion relative to all the lessons of the course and to some of the student's presentations.
By using extra-time on Monday (the class ends at 6 but we will extend some lectures until 7) the course will be completed in December, with a mix of 32 hrs of lesson and 16 hrs of (unconventional) exercises and project presentations; by doing short presentations at 3 times, students will be driven toward a completion of their project work in due time and minimizing the risk of failure.

 

More about Projects and recommended course readings

Schedule

  1. Thu 6/10: Course Intro
  2. Mon 10/10: Search Computing: Intro [L1a] + User Interaction [L1b]
  3. Thu 13/10: Search Computing: Service Registration [L2a] + Engine [L2b]
  4. Mon 17/10: Project Presentations – possibly requiring extra-time
  5. Thu 20/10: Student allocation to projects [E1] + Foundations of Information Retrieval [L3]
  6. Mon 24/10: Foundations of Information Retrieval [L4]
  7. Thu 27/10: Web Information Retrieval [L5]
  8. Thu 3/11: Web Information Retrieval [L6]
  9. Mon 7/11: Project Plan – possibly requiring extra-time [E2]
  10. Thu 10/11: Semantic Web [L7]
  11. Mon 14/11: A Practical introduction to SPARQL [L8]
  12. Thu 17/11: Invited seminar by Andrei Broder, Yahoo! Barcelona [L9]
  13. Mon 21/11: Multimedia Information Retrieval  [L10]
  14. Thu 24/11: Natural Language Processing for Information Retrieval [L11]
  15. Mon 28/11: Project Midterm Review - possibly requiring extra-time [E3]
  16. Thu 1/12: Rank-Join Methods [L12]
  17. Mon 5/12: A practical introduction to HTML 5 [L14]
  18. Mon 12/12: Games With A Purpose (GWAP) [L15]
  19. Thu 15/12: Data Streams [L16a] + Massive Web Data Computing: Hadoop [L16b]
  20. Mon 19/12: Principles of Human Computation [L13]

Lecture Notes

  • S. Ceri
    [L1a] Search Computing Introduction (PDF | PPTX)
  • A. Bozzon
    [L1b] User Interaction (PDF)
  • S.Quarteroni
    [L2a] Search Computing: Semantic Framework & Service Registration ()
  • D. Braga, S. Vadacca
    [L2b] Search Computing: Engine (PDF|PPTX)
  • S. Ceri
    [L3 - 4] Foundations of Information Retrieval (PDF| PPTX)
  • S. Ceri
  • [L5 - 6] Web Information Retrieval (PDF| PPTX)
     
  • E. Della Valle
  • [L7] Semantic Web (PPT)

    Please read before class:

    1. T Berners-Lee, J Hendler, O Lassila. The semantic web. Scientific American, 2001
    2. N Shadbolt, W Hall, T Berners-Lee. The semantic web revisited. Intelligent Systems, IEEE, 2006
    3. C Bizer, T Heath, Tim Berners-Lee. Linked Data - The Story So Far. Int. J. Semantic Web Inf., 2009
  • E. Della Valle
    [L8] A Practical introduction to SPARQL (PPT)
  • Please read this chapter before class.
  • A. Bozzon
    [L10] Multimedia Information Retrieval (PDF)
  • S. Quarteroni
    [L11] Natural Language Processing for Information Retrieval (PDF)
  • D. Martinenghi, M. Tagliasacchi
    [L12] Rank-Join Methods (PDF)
  • A. Bozzon
    [L14] A Practical Introduction to HTML5 (PDF)
  • P. Fraternali
    [L15] Games With A Purpose (GWAP) (PDF)(PPT)
  • D. Barbieri
    [L16a] Data Streams (PDF)
  • S. Vadacca
    [L16b] Massive Web Data Computing (Hadoop) (PDF)
  • M. Brambilla
    [L13] Principles of Human Computation