Discussion:
DPS Initial Ideas
(too old to reply)
David Naylor
2007-05-11 21:02:31 UTC
Permalink
Hi,

Thank you all for your responses, it has given me much to think about. I
guess there is consenses that there is room for improvement in the current
pkg system. Attached are some of my initial ideas about what is required and
expected in any (and all future) package systems.

Since I am going away this weekend I have had limited time to work on this
document and as such it is in early stages of development. My objective is
to get a clear understanding and target of what is required for this new
package system.

I am looking at a hybrid approach to storing the package metadata, a
combination of SQLite and compressed text files. I am hoping to create a
situation where if either gets corrupted it can be created from the other.
Further more, any changes to the text files (even without recompressing them)
will be intergrated back into the database. This will allow administrators
to fiddle around with the text files without having to touch the database.

((
Another idea I have is the ability for the whole package data storage to be
recreated from the repository data if all stored data got "destroyed". This
will occur though a process of inspection of the file system. This could be
extended to allow all none package files to be combined into a "special"
package with the correct dependencies allowing a system to be restored
without a single error... Any thoughts?
))

I would also like to create package groupings, where by many packages share
the same package file. This will allow easy distribution of ports such as
the Xorg 7.2, reducing the 400 or so packages to only a few physical files
that can then be installed and managed individually.

Ultimately I would like this project to be compatible with the current pkg
system (allowing easy transition from one to another), proper integration
into the ports system and possibly into the pkgsrc system, but the future
will only be revealed in time.

All feedback is welcome, I do have some questions to ask:

1) What would random access of a package be required for and how often
will/does it occur?

2) I have chosen XML for storing the data. Is there any practical alternative
(Please keep in mind that multiple packages metadata could be stored in a
single file)

I apologies if the document is too dense any too cryptic, all ideas are
welcome.

Once again thank you all for your feedback and have a good weekend.

David
Kris Kennaway
2007-05-11 21:28:47 UTC
Permalink
Michel Talon
2007-05-12 00:42:09 UTC
Permalink
Kris Kennaway explained in 4 points why a proposal to introduce a new
package system is doomed to failure.
"I think your current proposal falls short on points 2) and 3). In
particular, I don't see where SQLite is necessary to solve any
problems we are currently facing, and your proposal conflicts with
existing work."
Existing work consists in integrating metadata in a Berkeley DB
database, when new project would like to integrate them in SQLite
database. Now existing software (portupgrade) has shown what can be
gained by pushing all sort of unstructured data in a Berkeley base :-(
In this same thread N arguments have been advanced by various people
showing why using something more structured and formal like a SQL
database would be beneficial.

One of the most obvious being that the sqlite database can be edited
as easily as a pure textfile using the sqlite3 program, or can be
accessed programmatically from a lot of languages (C, perl, python,
ruby) using well known and well tested libraries or connectors.

Since sqlite is public domain, there is no licence objection to bring it
in FreeBSD. Moreover it is a small program and which could be very
useful to a lot of users and for a lot of alternative uses.
As a consequence the "high bar of entry" and the "proof of necessity"
seem to me an unreasonably stringent condition.


Because the problem to be solved is not the "collapse under its own
weight" of the proposals, but the collapse of the FreeBSD ports system.
Someone (i think des) has said not long ago that Debian or Debian like
systems were easier to maintain than FreeBSD. This is an understatement.
--
Michel TALON
Bill Moran
2007-05-12 01:31:54 UTC
Permalink
Post by Michel Talon
Kris Kennaway explained in 4 points why a proposal to introduce a new
package system is doomed to failure.
What the hell? You're making like any effort to improve the packaging
system is doomed to failure without SQLite.

Before you even go in to any more of a rant, answer me this one
question:

Why should the FreeBSD project take on the load of importing _yet_
_another_ database library? What's wrong with using bdb, which has
been in the base system for as long as I can remember? It's what
various package managing tools already use with great success.
Post by Michel Talon
"I think your current proposal falls short on points 2) and 3). In
particular, I don't see where SQLite is necessary to solve any
problems we are currently facing, and your proposal conflicts with
existing work."
Existing work consists in integrating metadata in a Berkeley DB
database, when new project would like to integrate them in SQLite
database. Now existing software (portupgrade) has shown what can be
gained by pushing all sort of unstructured data in a Berkeley base :-(
In this same thread N arguments have been advanced by various people
showing why using something more structured and formal like a SQL
database would be beneficial.
One of the most obvious being that the sqlite database can be edited
as easily as a pure textfile using the sqlite3 program, or can be
accessed programmatically from a lot of languages (C, perl, python,
ruby) using well known and well tested libraries or connectors.
Since sqlite is public domain, there is no licence objection to bring it
in FreeBSD. Moreover it is a small program and which could be very
useful to a lot of users and for a lot of alternative uses.
As a consequence the "high bar of entry" and the "proof of necessity"
seem to me an unreasonably stringent condition.
Because the problem to be solved is not the "collapse under its own
weight" of the proposals, but the collapse of the FreeBSD ports system.
Someone (i think des) has said not long ago that Debian or Debian like
systems were easier to maintain than FreeBSD. This is an understatement.
--
Michel TALON
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
--
Bill Moran
Collaborative Fusion Inc.

***@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************
Mike Meyer
2007-05-12 02:01:46 UTC
Permalink
Post by Michel Talon
One of the most obvious being that the sqlite database can be edited
as easily as a pure textfile using the sqlite3 program
Huh? They can? With a pure textfile, if vi is busted, I can use ed. If
ed is also busted, I can use sed. What do I use on an sqlite database
if sqlite3 is busted?

I'm currently reinstalling all the ports on my system, changing
LOCALBASE from /usr/opt to /usr, just to see if this breaks
anything. Adjusting all those pure textfile configs for this is
trivial:

find /etc /usr/etc /usr/X11R6/etc -type f | xargs sed -e 's;/opt;;g' -i .opt

How can I do this "as easily" if I were using sqlite database(s)?

<mike
--
Mike Meyer <***@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
Michel Talon
2007-05-12 09:09:35 UTC
Permalink
Post by Mike Meyer
Post by Michel Talon
One of the most obvious being that the sqlite database can be edited
as easily as a pure textfile using the sqlite3 program
Huh? They can? With a pure textfile, if vi is busted, I can use ed. If
ed is also busted, I can use sed. What do I use on an sqlite database
if sqlite3 is busted?
Answering both you and Bill Moran:

- first i don't suppose sqlite3 is busted, since i suppose it is in the
base system and it works by definition. Your hypothesis is alike, what
do i do to edit my config files if vi and ed are busted? Moreover if
sqlite3 gets really busted i can import a copy and hope it works, it
requires very few libraries and other files, not much more than vi,
plus the sqlite3 library, of course. The combined size of sqlite3
and libsqlite3 is less than 400k.
- second, if i am sql allergic, it takes one command to export the table
to a straight file, each row in a line, each field separated by | or
anything else of my choice. Exactly the same tools that you have
mentioned allow to edit this file, and then one command allows to load
it in the database.
- so what are the benefits? They are that non sql impaired people can
make good use of the power of sql queries to simplify their work. And
this without reducing the possibilities of sql impaired people. Moreover
one can use general tools like graphic sql tools to present the
contents of the database to the end user in a pleasant way if it is
desired. And finally it may be that the transactional properties of
sqlite can be used to gain better reliablity.
- is the cost of including sqlite in the base system so high that
the above benefits are insufficient? Personnally i don't know, but i
think some discussion is at least in order.
- and finally to answer one of Bill's critiques, why sqlite rather than
a Berkeley database? Precisely because sqlite offer a lot of facilities
that Berkeley db doesn't offer, such as export and import to and from
csv files, auto documentation of the table contents, while it requires
in fact programming and knowledge of the api of the database to hand
edit the Berkeley db.

Anyways, i have read that Marc Espie is envisioning using sqlite3 for
OpenBSD package system, and that he is very satisfied with what he has
seen up to now. If this enters production, perhaps this will confer
BSD legitimity to such practices ... Seriously, the FreeBSD package
system is in great need of a profound overhaul, pretending it works well
is complete denial of reality. I hope that young people working on
summer code projects will infuse *new* ideas, and not spend their
vacations polishing inadequate tools.
--
Michel TALON
Ivan Voras
2007-05-12 21:25:58 UTC
Permalink
Tom Judge
2007-05-13 23:07:54 UTC
Permalink
Post by Dag-Erling Smørgrav
Seriously, the FreeBSD package system is in great need of a profound
overhaul, pretending it works well is complete denial of reality.
Perhaps, but I seriously doubt that you are the correct person for the
job.
DES
This is exactly the kind of response that will push people away from the
project/community. If everyone that suggests some change that a
respected member of the community did not like and then said member sent
a response like this I would guess that people would stop suggesting any
changes/features.

It is this kind of attitude that will drive new blood away from the
project. I thought the idea was to try to encourage people to make
improvements to the project rather than to drive them away.

BTW I am not saying that any of the ideas that have been discussed in
this thread are good or bad, just that this kind of response seems to be
against the ethos of the project.

Just my 2p

Tom
Sean Bryant
2007-05-14 15:25:11 UTC
Permalink
Post by Dag-Erling Smørgrav
Post by Mike Meyer
You may want to look at some other Linux distros. The packages system
dates to about the same era as rpm/debs. The package system is much,
much more manageable than them. On the other hand, most Linux distros
have moved beyond those tools, to things like yum, apt-get and
up2date. Those incorporate the facilities that rpm and debs are
missing in a higher-level tool, and I think they are slightly more
manageable than freebsd packages.
As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.
apt-get --build source <package-name>
DES
I'm just going to interject here, I apologize if this is out of place.
I've been following the threads on SQLite in the base for ports and the
DPS initial ideas threads.

Kris has stated it would be wise to actually list out the problems with
the current ports / packaging systems and I fully agree. As right now
its a bunch of back and forth bickering with no real solution in sight
because those proposing solutions don't have a full understanding what
it takes to change the systems, and those who do are really reluctant to
adopt such changes due to their limited focus and maybe even the
developers experience.

I propose someone post on the wiki (http://wiki.freebsd.org/) the
current issues with the ports and packaging systems so that everyone can
get a better understanding of the problems that need to be solved. This
could possibly help the SoC student improve the system by giving him a
clearer picture of what's wrong. This could also help lead a more
unified initiative to improve the systems and prevent adding new issues
to the system.

Prototyping a solution when the problems are known is much easier than
stabbing at the problem in the dark.
Sean Bryant
2007-05-14 16:42:49 UTC
Permalink
Post by Andrew Pantyukhin
Post by Sean Bryant
I propose someone post on the wiki (http://wiki.freebsd.org/)
http://wiki.freebsd.org/Upak
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to
I was sort of hoping for something a bit more straightforward like a
list that identifies concrete issues with the ports and packages systems.
Peter Jeremy
2007-05-12 13:11:24 UTC
Permalink
Kris Kennaway
2007-05-12 19:33:02 UTC
Permalink
Kris Kennaway
2007-05-12 21:33:51 UTC
Permalink
Matthew Jacob
2007-05-12 19:39:42 UTC
Permalink
Seriously, the FreeBSD package system is in great need of a profound
overhaul, pretending it works well is complete denial of reality. I
hope that young people working on summer code projects will infuse
*new* ideas, and not spend their vacations polishing inadequate tools.
Hmm? Works fine for me and many others who are more than casual users.

I think Kris has it right when he asks you to state what problems need
to be addressed and *then* how you would address them- not to find a way
to address problems first.

respectfully

-matt
Michel Talon
2007-05-12 21:44:22 UTC
Permalink
Post by Michel Talon
Seriously, the FreeBSD package
system is in great need of a profound overhaul, pretending it works well
is complete denial of reality. I hope that young people working on
summer code projects will infuse *new* ideas, and not spend their
vacations polishing inadequate tools.
I know that this is your belief, but please try to avoid grasping at
straws: there are elements in your argument that are along the lines
of "The FreeBSD package system is broken and needs to be fundamentally
changed. Rewriting it to use SQLite is a fundamental change.
Therefore rewriting it to use SQLite will fix the problems."
Really i don't think at all this way. I think that *perhaps* SQLite
may marginally better than a Berkeley database for solving part of the
problem, not much more. What i reacted to, was the conservatism which
pervades the community as soon as someone emits the idea of using a new tool.
First figure out what specific problems need to be solved, then figure
out how to solve them, not the other way around. So far I have seen
little discussion of how SQLite is necessary and sufficient for fixing
fundamental issues. The argument in favour of SQL seems to boil down
to "It's SQL! You can do more complex queries...if you wanted to".
No, for me the main argument is that SQL is more familiar for many people than
running a perl script to connect to a Berkeley database. I have also heard
that SQLite is more performant, but i would have to see it to beleive it.
Without a clear demonstration of how this would solve a problem
associated with package management, it is not very compelling and
basically reduces to change for the sake of change.
I think that a lot of changes are necessary, and it seems they will happen. So
*perhaps* it may be beneficial in this sea of changes to consider a minor
change, moving from a more traditional Berkeley database to SQLite.
As I discussed in my email yesterday, there are serious issues to be
solved.
I think some of the issues have nothing to do with the database question.
Some of the issues are entirely trivial to solve. One of the worst offenders
for misbehaviour of the package system is the constant changes in the port
origins and the poor standardisation of the package names. When it will
be clear that these name changes bring nothing to the table but
introduce a lot of confusion both for end users and automated programs,
things will be easier.

It may be that borrowing from Debian the idea of "abstract" dependencies
which can be fulfilled by several concrete packages may also simplify
the dependency problem. For example tomcat may depend on "java" and java
my be fulfilled either by diablo-jdk15 or jdk15. This way when you change
from diablo-jdk15 to jdk15 you don't need to change anything to tomcat.

Another feature that Debian has, and which may happily complete the previous
one, is the specification of necessary dependencies with a version number
in a certain range (this obviously requires a reasonable standardisation of
version numbers, so that comparison of <some package>-0.99 to
<some package>-1.0-rc doesn't depend on arcane rules). This way you don't need
to change dependencies which are in the correct range, even if a more recent
version exists. This mechanism has been imported in NetBSD pkgsrc.

And a problem which has proven useful in Debian is keeping track of the
packages which have been required by the end user and those which have been
installed as dependencies. This is the difference between apt-get and
aptitude. Apparently people are very happy to be able to remove not only
a package they have required, but also all its dependencies (which are
not required by another program) at one stroke. This also helps in case
some big package requires dependency A, but after upgrade, they have changed
their mind and require alternative dependency B. With this mechanism, after
upgrade A disappears, while without it you will have both an upgraded version
of A and B. I have observed on my machine this is an important cause
of time monotonic bloat of the package tree.

To answer the slowness problem in registering installed packages, one may
think about making use of the INDEX file. In fact all the information that
is necessary to fill the dependency entries is contained in INDEX, and
accessible here in milliseconds with any tool such as awk. It so happens that
the ports system doesn't make any use of the INDEX file and systematically
recomputes the dependencies through recursive make invocations which are very
time consuming. Of course this requires up to date INDEX, or a mechanism to
keep INDEX continually up to date.


Part of the registration is also filling the +REQUIRED_BY files of the
dependencies of a package when one installs a package. If this package has a
lot of dependencies this means opening, editing and closing a large number of
files. This is expensive. One may imagine using a database containing the
global dependency information, then +REQUIRED_BY files are no more necessary,
since the information can be recomputed in very little time. In my
little python experiments, recomputing the complete set of +REQUIRED_BY files
for around 700 ports takes around one second. By the way, topological sorting
the DAG of the whole port tree (> 15 000 ports) takes of the order of 2
seconds, so it is clear that if major performance problems occur, they
cannot be ascribed to such DAG sorting.
Some of them can be solved by improving the storage backend
of the package database to use a database; but this is in progress
using existing tools.
Yes, and i don't buy the idea that using *existing* tools is better than
using the best tool for the job (assuming one can prove what is the best tool,
considering power, familiarity, etc.).
Given that this work is happening (or at least will be happening, I am
not sure when the SoC officially starts), the best thing is for
interested people to work with Garrett to help him achieve the goals
of his project.
Sure. I am convinced this is the reason why several people, including myself
present some ideas in the mailing list now, before Garrett begins working on
his project. Of course after that, he will be in charge, with his mentor, and
i hope they will do something wonderful. As you are well aware, designing
a very good ports system is particularly difficult, unfortunately,
particularly in the FreeBSD context where building from source is considered
fashionable, which makes designing an efficient upgrade system almost
impossible.
Kris
--
Michel TALON
Ivan Voras
2007-05-13 00:23:41 UTC
Permalink
Matthew Seaman
2007-05-13 07:46:17 UTC
Permalink
The problem is that maintaining the INDEX is expensive and/or tricky.
p5-FreeBSD-Portindex comes close but seems to have some wrinkles.
If you'ld just tell me what you perceive the wrinkles to be, then I'd
have a fighting chance at addressing them, which I would be glad to do...

Cheers,

Matthew

- --
Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard
Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
Kent, CT11 9PW
Kris Kennaway
2007-05-12 22:24:35 UTC
Permalink
Kris Kennaway
2007-05-13 08:44:04 UTC
Permalink
Post by Matthew Seaman
The problem is that maintaining the INDEX is expensive and/or tricky.
p5-FreeBSD-Portindex comes close but seems to have some wrinkles.
If you'ld just tell me what you perceive the wrinkles to be, then I'd
have a fighting chance at addressing them, which I would be glad to do...
I only looked today so I didn't have time to fully investigate things,
which is why you didn't hear from me directly yet :)

Basically there are some differences (extra whitespace, etc) that are
cosmetic but which make validation against the full INDEX build more
difficult, but the major one seems to be that ports that change their
name dynamically (depending on e.g. installed ports detected, or
changes in build options) do not seem to have this reflected in the
incremental index.

Kris
Peter Jeremy
2007-05-13 10:37:57 UTC
Permalink
Matthew Seaman
2007-05-13 11:03:32 UTC
Permalink
Post by Kris Kennaway
Post by Matthew Seaman
The problem is that maintaining the INDEX is expensive and/or tricky.
p5-FreeBSD-Portindex comes close but seems to have some wrinkles.
If you'ld just tell me what you perceive the wrinkles to be, then I'd
have a fighting chance at addressing them, which I would be glad to do...
I only looked today so I didn't have time to fully investigate things,
which is why you didn't hear from me directly yet :)
Basically there are some differences (extra whitespace, etc) that are
cosmetic but which make validation against the full INDEX build more
difficult, but the major one seems to be that ports that change their
name dynamically (depending on e.g. installed ports detected, or
changes in build options) do not seem to have this reflected in the
incremental index.
Extra whitespace I can fix for you -- it's just the COMMENT field which
is affected IIRC. I just copy the string exactly as shown in the port's
Makefile. make index collapses multiple whitespace to single. As you say,
cosmetic. Also I get the sorting 'for free' by using the properties
of BDB btrees. Unfortunately it disagrees somewhat with the collation
order generated by sort.

Ports that change their name dynamically are tricky. If it really is an
automatic change without administrative intervention then there's not a
lot I can do -- and I believe such behaviour is held to be a bug by the
ports system. I do use the port directory as the unique key for referring
to any port, whereas make index uses the pkgname when writing out the
INDEX, which causes some differences. An example: games/freeciv. If you
have one of the gtk packages installed (as I do) it will automatically
change package name:

happy-idiot-talk:...ports/games/freeciv:% make -V PKGNAME
freeciv-gtk2-2.0.8_2

This generates an warning about 'duplicate package name' with make index,
(due to a collision with the games/freeciv-gtk2 slave port) and only one
row in the final INDEX. With FreeBSD::Portindex, no errors are generated
at all, and there are entries for both the main and slave ports like so:

happy-idiot-talk:/usr/ports:% grep ^freeciv-gtk2 INDEX-6 | cut -c 1-78
freeciv-gtk2-2.0.8_2|/usr/ports/games/freeciv|/usr/local|Free turn-based multi
freeciv-gtk2-2.0.8_2|/usr/ports/games/freeciv-gtk2|/usr/local|Free turn-based

I can certainly add a check for duplicate PKGNAME and emit warnings. In
order to be sure of getting the canonical INDEX-N you'ld need a system
with no ports installed. Well, other than p5-FreeBSD-Portindex and
dependencies -- none of which suffer from this problem.

Where the package name changes due to explicit administrative choice, in
the main that's either due to setting variables in the environment (which
make later picks up), setting variables in the make infrastructure (eg
/etc/make.conf) or using one of those blue and grey options screens, which
changes a Makefile under /var/db/ports.

There's already a facility for scrubbing everything out of the environment
except USER, HOME, PATH, SHELL, TERM and TERMCAP

Changes in well known Makefiles like /etc/make.conf or any Makefiles under
/usr/ports will either trigger a warning message (generally saying you
need to reinitialise the cache, because otherwise it would lead to
rechecking every port, which might be a big waste of time depending on
the nature of the changes to the makefile) or cause any port that includes
that Makefile to be re-checked and its cache entry updated. That will
pick up most of the places where an administrator might make changes
to affect how ports are compiled, although a sufficiently ingenious admin
could still put things in such odd places p5-FreeBSD-Portindex wouldn't
find them...

Tracking changes to OPTIONS settings is a good point though. I need to
implement that.

Cheers,

Matthew

- --
Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard
Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
Kent, CT11 9PW
Thomas Sparrevohn
2007-05-13 11:58:49 UTC
Permalink
- Ignore the existence of INDEX - which makes computing dependencies
very time consuming
- Fully rebuild INDEX via "make describe" whenever you update any ports
- this takes of the order of an hour
- Find and rebuild the changed bits of INDEX - p5-FreeBSD-Portindex
uses this approach.
- Build a tool that functionally does "make describe" but does it in
bulk much faster (eg by pre-parsing the include files once instead
of 17000 times).
Having played around with using Postgres as a database for ports - I must
stress that its not a database vs. flatfile issue - It is quite easy to build a
reasonable "Ports database" - however it does not help on the issue - namely
that dependencies and options means that it is needed to run make in order
to gurantee that the INDEX file are correct

It seems to be a non-debate what format the database is in if there not a
good answer to how ensure that only ports that has changed are updated.

At the end of the day - "make based ports" are the only real safe way to manage
ports - However the focus on the indexing side seems misplaced - example -
make INDEX on this host take 8-12 Minutes - compiling all ports installed takes
24 Hours - now if I "hand build" the dependencies structure and run the builds in
parallel it takes down to 4-5 Hours - so lets say we half the time it takes to
maintain the index - well - it cuts minimum time off the entire build process and
the effort and energy proberly better spend on trying to define a build sequence
that allows ports to build with "make -j x" and with parallel builds where "-j n"
does not work

Using XML for INDEX are a very good idea mainly because it allows "ports"
to interface in an easy way to external tools - e.g. java frontends -
web browsers etc, etc. However there are drawbacks - Yet I feel that the
discussion about what tool to use as indexing are completely misplaced
if the only point is that somebody likes SQL better than a directory tree.
Post by Michel Talon
Yes, and i don't buy the idea that using *existing* tools is better than
using the best tool for the job (assuming one can prove what is the best tool,
considering power, familiarity, etc.).
Remind me - we are told that SQL are the answer but what was the question again?
Demonstrate a better tool.
Always the best way ;-)
Matthew Seaman
2007-05-13 17:25:19 UTC
Permalink
Post by Matthew Seaman
Extra whitespace I can fix for you -- it's just the COMMENT field which
is affected IIRC. I just copy the string exactly as shown in the port's
Makefile. make index collapses multiple whitespace to single. As you say,
cosmetic. Also I get the sorting 'for free' by using the properties
of BDB btrees. Unfortunately it disagrees somewhat with the collation
order generated by sort.
Here's the result of crunching multiple spaces in the COMMENT fields:

happy-idiot-talk:/tmp:% diff -C 0 -u bar foo
- --- bar Sun May 13 18:12:08 2007
+++ foo Sun May 13 18:12:01 2007
@@ -1402 +1402 @@
- -lrzsz-0.12.20_1|Receive/Send files via X/Y/ZMODEM protocol. (unrestrictive)
+lrzsz-0.12.20_1|Receive/Send files via X/Y/ZMODEM protocol. (unrestrictive)
@@ -1476 +1476 @@
- -zmtx-zmrx-1.02|Receive/Send files via ZMODEM protocol. (unrestrictive)
+zmtx-zmrx-1.02|Receive/Send files via ZMODEM protocol. (unrestrictive)
@@ -1809,2 +1809,2 @@
- -p5-DBI-1.54|The perl5 Database Interface. Required for DBD::* modules
- -p5-DBI-1.37_1|The perl5 Database Interface. Required for DBD::* modules
+p5-DBI-1.54|The perl5 Database Interface. Required for DBD::* modules
+p5-DBI-1.37_1|The perl5 Database Interface. Required for DBD::* modules
@@ -1962 +1962 @@
- -postgresql-libpq++-4.0_3|C++ interface for PostgreSQL
+postgresql-libpq++-4.0_3|C++ interface for PostgreSQL
@@ -2287 +2287 @@
- -vym-1.8.1|VYM (View Your Mind) is a tool to generate and manipulate maps
+vym-1.8.1|VYM (View Your Mind) is a tool to generate and manipulate maps
@@ -2490 +2490 @@
- -cvs+ipv6-1.11.17_1|IPv6 enabled cvs. You can use IPv6 connection when using pserver
+cvs+ipv6-1.11.17_1|IPv6 enabled cvs. You can use IPv6 connection when using pserver
@@ -3046 +3046 @@
- -newt-0.51.0_3|Not Erik's Windowing Toolkit: console I/O handling library
+newt-0.51.0_3|Not Erik's Windowing Toolkit: console I/O handling library
@@ -4189 +4189 @@
- -py24-simpletal-4.1|Stand alone TAL Python implementation to power HTML & XML templates
+py24-simpletal-4.1|Stand alone TAL Python implementation to power HTML & XML templates
@@ -4783 +4783 @@
- -vile-9.5n|VI Like Emacs. a vi "workalike", with many additional features
+vile-9.5n|VI Like Emacs. a vi "workalike", with many additional features
@@ -4943 +4943 @@
- -vMac-0.1.9.3_1|Emulates a MacPlus machine! Runs MacOS versions up to 7.5.5
+vMac-0.1.9.3_1|Emulates a MacPlus machine! Runs MacOS versions up to 7.5.5
@@ -5582 +5582 @@
- -libfov-1.0.2|C library for calculating fields of view on low resolution rasters
+libfov-1.0.2|C library for calculating fields of view on low resolution rasters
@@ -6039 +6039 @@
- -xkobo-1.11|Multi-way scrolling shoot 'em up game for X. Strangely addictive
+xkobo-1.11|Multi-way scrolling shoot 'em up game for X. Strangely addictive
@@ -7304 +7304 @@
- -ja-mypaedia-fpw-1.4.3_2|An encyclopedia "Mypaedia" (EPWING V1 format)
+ja-mypaedia-fpw-1.4.3_2|An encyclopedia "Mypaedia" (EPWING V1 format)
@@ -9582 +9582 @@
- -xless-1.7|An X11 viewer for text files. Useful as an add-on tool for other apps
+xless-1.7|An X11 viewer for text files. Useful as an add-on tool for other apps
@@ -11135 +11135 @@
- -sniffit-0.3.7b_2|A packet sniffer program. For educational use
+sniffit-0.3.7b_2|A packet sniffer program. For educational use
@@ -11562 +11562 @@
- -cups-samba-6.0|The Common UNIX Printing System: MS Windows client drivers
+cups-samba-6.0|The Common UNIX Printing System: MS Windows client drivers
@@ -11825 +11825 @@
- -ru-apache-1.3.37+30.23|The extremely popular Apache http server. Very fast, very clean
+ru-apache-1.3.37+30.23|The extremely popular Apache http server. Very fast, very clean
@@ -12023 +12023 @@
- -chrootuid-1.3|A simple wrapper that combines chroot(8) and su(1) into one program
+chrootuid-1.3|A simple wrapper that combines chroot(8) and su(1) into one program
@@ -14936 +14936 @@
- -mozex-1.07_5|Mozex allows users of to use external programs for mail, news, etc.
+mozex-1.07_5|Mozex allows users of to use external programs for mail, news, etc.
@@ -15712 +15712 @@
- -webreport-1.5|WebReport is a web log statistics program for web hosting sites
+webreport-1.5|WebReport is a web log statistics program for web hosting sites

This is after running the generated INDEX files through:

cut -d '|' -f 1,4 INDEX

Mostly it's the standard 'two spaces after a full stop', but there are a
number of what look to me like mistakes. I can't parse that mosex entry
at all..

Cheers,

MAtthew

- --
Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard
Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
Kent, CT11 9PW
Dag-Erling Smørgrav
2007-05-13 18:42:47 UTC
Permalink
Seriously, the FreeBSD package system is in great need of a profound
overhaul, pretending it works well is complete denial of reality.
Perhaps, but I seriously doubt that you are the correct person for the
job.

DES
--
Dag-Erling Smørgrav - ***@des.no
Benjamin Lutz
2007-05-13 18:42:21 UTC
Permalink
Post by Thomas Sparrevohn
Using XML for INDEX are a very good idea mainly because it allows
"ports" to interface in an easy way to external tools - e.g. java
frontends - web browsers etc, etc. However there are drawbacks - Yet
I feel that the discussion about what tool to use as indexing are
completely misplaced if the only point is that somebody likes SQL
better than a directory tree.
I'd have said that using XML for INDEX is a bad idea, because INDEX can
then no longer be easily processed with any of the tools in the FreeBSD
base system. With the format it uses now, I can easily grep, awk, etc
it. If you need an XML version of INDEX, it's easy to have just these
tools build one for you though.

Not to mention that INDEX is already big enough as it is, imo. I don't
see why it should be bloated even more with redundant information.

Cheers
Benjamin
Kris Kennaway
2007-05-13 20:07:23 UTC
Permalink
Post by Matthew Seaman
Post by Matthew Seaman
Extra whitespace I can fix for you -- it's just the COMMENT field which
is affected IIRC. I just copy the string exactly as shown in the port's
Makefile. make index collapses multiple whitespace to single. As you say,
cosmetic. Also I get the sorting 'for free' by using the properties
of BDB btrees. Unfortunately it disagrees somewhat with the collation
order generated by sort.
cut -d '|' -f 1,4 INDEX
Mostly it's the standard 'two spaces after a full stop', but there are a
number of what look to me like mistakes. I can't parse that mosex entry
at all..
COMMENT= Mozex allows users of ${GEKO} to use external programs for mail, news, etc.

It's supposed to be ${GECKO} I guess...it's another variant that will
change when the user (or a port) changes the value of that variable.

Kris
Kris Kennaway
2007-05-13 20:27:37 UTC
Permalink
Well - Naturally if the only index format was based upon XML it would not be
very practical -
However XML currently seems to take the lead when the talk is on portability
as a data format
and it is very easy to convert to "Pure Text" - There seems to be a bias
towards SNMP MIB format
generally in FreeBSD e.g. sysctl etc. which has even worse drawbacks
But as I said - I very much doubt that the format of the INDEX file and the
on disk package db
structure is the most burning issue for ports - I am sure that there are
optimisations that could improve
the current performance without having to change the structure into SQL - If
however that is the target
then XML would be a significantly better candidate because a proper XML
schema can be used as a
middle layer for all the tools - regardless the storage structure of the
package db etc. -
If we introduced a proper abstraction - then people can use SQL/ flat files
/ existing structures
But the tools we still only need one common interface to XML
FYI, "Using XML" and other buzzword-compliance is not currently on the
table either. Let's all try to maintain some focus, OK?

Kris
Kris Kennaway
2007-05-13 21:04:20 UTC
Permalink
Duane Whitty
2007-05-13 21:20:59 UTC
Permalink
The answer is another INDEX/storage structure
Great, I look forward to your detailed proposal.
Kris
I believe this is closer to what Thomas meant:

but it has a number of drawbacks - however it [is in] no way clear
whether the answer is another INDEX/storage structure

Correct me if I am wrong Thomas.

Duane
Matthew Seaman
2007-05-13 21:39:46 UTC
Permalink
Post by Matthew Seaman
I can certainly add a check for duplicate PKGNAME and emit warnings. In
order to be sure of getting the canonical INDEX-N you'ld need a system
with no ports installed. Well, other than p5-FreeBSD-Portindex and
dependencies -- none of which suffer from this problem.
Hmmm, well, I have the first cut at this now. As an added bonus, it
enforces having the port mentioned in the $SUBDIR variable of the
category Makefile before it will add it to the INDEX[*].

Turns out there are at least 6 ports present in the tree but not hooked
up in that way:

happy-idiot-talk:/tmp:% portindex -o INDEX.m |& grep 'not referenced'
FreeBSD::Portindex::Tree:printindex(): /usr/ports/emulators/linux-vmware-toolbox6 is not referenced from the /usr/ports/emulators category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/emulators/vmware-guestd6 is not referenced from the /usr/ports/emulators category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/net-mgmt/nipper is not referenced from the /usr/ports/net-mgmt category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/net/asterisk12-app-ldap is not referenced from the /usr/ports/net category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/x11-fonts/libXfont is not referenced from the /usr/ports/x11-fonts category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/x11-fonts/xfs is not referenced from the /usr/ports/x11-fonts category -- not added to INDEX

as well as a number of duplicate PKGNAMEs -- mostly to do with A4 vs
letter paper size.

Cheers,

Matthew

[*] Should this always be enforced? Hmmm... I think I'll add a
'--strict' option, including that. Being able to add arbitrary ports
into the INDEX can be vaguely useful sometimes.

- --
Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard
Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
Kent, CT11 9PW
Kris Kennaway
2007-05-13 21:44:27 UTC
Permalink
Post by Matthew Seaman
Post by Matthew Seaman
I can certainly add a check for duplicate PKGNAME and emit warnings. In
order to be sure of getting the canonical INDEX-N you'ld need a system
with no ports installed. Well, other than p5-FreeBSD-Portindex and
dependencies -- none of which suffer from this problem.
Hmmm, well, I have the first cut at this now. As an added bonus, it
enforces having the port mentioned in the $SUBDIR variable of the
category Makefile before it will add it to the INDEX[*].
Turns out there are at least 6 ports present in the tree but not hooked
happy-idiot-talk:/tmp:% portindex -o INDEX.m |& grep 'not referenced'
FreeBSD::Portindex::Tree:printindex(): /usr/ports/emulators/linux-vmware-toolbox6 is not referenced from the /usr/ports/emulators category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/emulators/vmware-guestd6 is not referenced from the /usr/ports/emulators category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/net-mgmt/nipper is not referenced from the /usr/ports/net-mgmt category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/net/asterisk12-app-ldap is not referenced from the /usr/ports/net category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/x11-fonts/libXfont is not referenced from the /usr/ports/x11-fonts category -- not added to INDEX
FreeBSD::Portindex::Tree:printindex(): /usr/ports/x11-fonts/xfs is not referenced from the /usr/ports/x11-fonts category -- not added to INDEX
I think they're all repo-copies in progress, and should indeed not
appear in the index yet.
Post by Matthew Seaman
as well as a number of duplicate PKGNAMEs -- mostly to do with A4 vs
letter paper size.
What I would like is for the incrementally built index to be identical
to the 'make index' version, or at least up to cosmetic differences
that I can post-process away automatically. Then I can set up a
comparison that validates the incremental index over a period of time
to look for remaining corner cases where it gets out of sync.

Kris
Benjamin Lutz
2007-05-13 21:55:02 UTC
Permalink
The on-disk format seems to be the wrong angle on the issue - The
current structure Works well - but it has a number of drawbacks -
however it no way clear whether that The answer is another
INDEX/storage structure
When coming up with ideas what to change INDEX's storage method to, just
keep in mind that

- There is very little flexibility whatsoever in the way data is stored
in the file. Each entry in the INDEX has its 13 or so fields, and
that's it. One of the strengths of XML, self-descriptiveness for very
dynamic data structures, doesn't matter for INDEX. Basically, imho,
using XML for tabular data = bad.

- INDEX exists for speed. Accessing the information in it should be as
fast as possible. I object to any change that increases the time
needed to search for and parse INDEX entries. I've written a little
searching tool (it can be found ports-mgmt/psearch). If INDEX were to
be converted to XML, just because of that it would be considerably
slower. If psearch then were to use standard XML parsing libs, the
slowdown would probably be at least an order of magnitude.

If there's any change to INDEX's format, it should make access faster,
not slower. A format that allows constant-time random access would be
nice, for example.

- INDEX does not need to be portable. It'll be used on FreeBSD systems
only, and by tools written specifically for the ports system.

The second point is most important here. This whole thread exists
because people consider the existing ports system to be too slow. How
is using XML going to help with that at all?

Cheers
Benjamin
Mike Meyer
2007-05-14 00:49:32 UTC
Permalink
Post by Michel Talon
Post by Mike Meyer
Post by Michel Talon
One of the most obvious being that the sqlite database can be edited
as easily as a pure textfile using the sqlite3 program
Huh? They can? With a pure textfile, if vi is busted, I can use ed. If
ed is also busted, I can use sed. What do I use on an sqlite database
if sqlite3 is busted?
- first i don't suppose sqlite3 is busted, since i suppose it is in the
base system and it works by definition. Your hypothesis is alike, what
do i do to edit my config files if vi and ed are busted? Moreover if
sqlite3 gets really busted i can import a copy and hope it works, it
requires very few libraries and other files, not much more than vi,
plus the sqlite3 library, of course. The combined size of sqlite3
and libsqlite3 is less than 400k.
You missed the point. The claim was "the sqlite database can be edited
as easily as a pure textfile." I claim this is not always true. In
particular, since someone has already mentioned using SQL for system
config file instead of just the pkackage db, if your system has
suffered a major failure such that commands in the base system - like
vi, ed, etc. - are busted, then sqlite (whether it's part of the base
system or not) can equally well be be broken. With flat text, there
are lots of tools in the base system that can be used for dealing with
them if one (or more) is broken. By your own admission, if sqlite is
so broken, the only alternative is to get another copy. Under these
circumstances, sqlite can *not* be "edited as easily as pure text
file".
Post by Michel Talon
- second, if i am sql allergic, it takes one command to export the table
to a straight file, each row in a line, each field separated by | or
anything else of my choice. Exactly the same tools that you have
mentioned allow to edit this file, and then one command allows to load
it in the database.
The point of the second question wasn't that some people are allergic
to SQL - the point was that pure text files are different from SQL,
and two have different sets of strengths in weaknesses. In this case,
a strength of a pure text file is that it's easy to ignore record and
field boundaries when operating on it. This kind of thing is hard to
do in SQL, so that the easiest way to do it may well be the one you
suggested - convert it to flat text, transform it, and convert
back. Once again, sqlite can not be "edited as easiliy as a pure text
file".

That said, the first case isn't crucial for the ports/packages
db. Your system will boot and pretty much run fine if that database is
screwed up, so importing a new copy to fix things isn't
unreasonable. I also agree that the need for performance in the ports
db is such that using a binary database of some kind is probably
justified - because speed isn't one of the strengths of pure text
files.

Whether or not SQL brings enough to the table to justify adding
sqllite to the base system when compared to tools that are already
there is another issue.


<mike
--
Mike Meyer <***@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
Dag-Erling Smørgrav
2007-05-14 06:38:16 UTC
Permalink
Post by Tom Judge
Post by Dag-Erling Smørgrav
Seriously, the FreeBSD package system is in great need of a profound
overhaul, pretending it works well is complete denial of reality.
Perhaps, but I seriously doubt that you are the correct person for the
job.
This is exactly the kind of response that will push people away from
the project/community. If everyone that suggests some change that a
respected member of the community did not like and then said member
sent a response like this I would guess that people would stop
suggesting any changes/features.
It is this kind of attitude that will drive new blood away from the
project. I thought the idea was to try to encourage people to make
improvements to the project rather than to drive them away.
I have no intention of driving Michel away, but he does not have the
necessary experience to undertake the project he proposes. There is a
reason why people have been discussing this for ten years without
getting anywhere.

DES
--
Dag-Erling Smørgrav - ***@des.no
Dag-Erling Smørgrav
2007-05-14 07:25:49 UTC
Permalink
Post by Mike Meyer
You missed the point. The claim was "the sqlite database can be edited
as easily as a pure textfile." I claim this is not always true. In
particular, since someone has already mentioned using SQL for system
config file instead of just the pkackage db, if your system has
suffered a major failure such that commands in the base system - like
vi, ed, etc. - are busted, then sqlite (whether it's part of the base
system or not) can equally well be be broken.
If your base system is busted, your priority is to fix your base system,
not muck around with packages. They can wait until the base system is
working again.

DES
--
Dag-Erling Smørgrav - ***@des.no
Dag-Erling Smørgrav
2007-05-14 09:07:06 UTC
Permalink
There is a reason why people have been discussing this for ten years
without getting anywhere.
I suspect that is because that by and large the ports system works ;-)
Not really, it's because fixing it is very hard. Starting from scratch
is easier, but you lose the established base of packages, and more
importantly, of experienced maintainers and users.

DES
--
Dag-Erling Smørgrav - ***@des.no
Mike Meyer
2007-05-14 13:36:52 UTC
Permalink
Post by Dag-Erling Smørgrav
There is a
reason why people have been discussing this for ten years without
getting anywhere.
I suspect that is because that by and large the ports system works ;-) - Having
Played around with a couple of Linux distributions - my impression is that "ports"
offers a much more manageable approach or maybe I am just used to ports ;-)
You may want to look at some other Linux distros. The packages system
dates to about the same era as rpm/debs. The package system is much,
much more manageable than them. On the other hand, most Linux distros
have moved beyond those tools, to things like yum, apt-get and
up2date. Those incorporate the facilities that rpm and debs are
missing in a higher-level tool, and I think they are slightly more
manageable than freebsd packages.

As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.

<mike
--
Mike Meyer <***@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
Dag-Erling Smørgrav
2007-05-14 14:35:18 UTC
Permalink
Post by Mike Meyer
You may want to look at some other Linux distros. The packages system
dates to about the same era as rpm/debs. The package system is much,
much more manageable than them. On the other hand, most Linux distros
have moved beyond those tools, to things like yum, apt-get and
up2date. Those incorporate the facilities that rpm and debs are
missing in a higher-level tool, and I think they are slightly more
manageable than freebsd packages.
As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.
apt-get --build source <package-name>

DES
--
Dag-Erling Smørgrav - ***@des.no
Andrew Pantyukhin
2007-05-14 16:16:33 UTC
Permalink
Post by Sean Bryant
I propose someone post on the wiki (http://wiki.freebsd.org/)
This might be relevant:
http://wiki.freebsd.org/Upak
Tom Evans
2007-05-14 16:48:18 UTC
Permalink
Peter Jeremy
2007-05-15 10:26:30 UTC
Permalink
Dag-Erling Smørgrav
2007-05-15 10:34:05 UTC
Permalink
[Linux package systems]
Post by Mike Meyer
As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.
This pretty much rules them out then.
It would, if it were true. It isn't.

apt-get --build source package_name

DES
--
Dag-Erling Smørgrav - ***@des.no
Mike Meyer
2007-05-15 15:23:40 UTC
Permalink
Post by Dag-Erling Smørgrav
[Linux package systems]
Post by Mike Meyer
As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.
This pretty much rules them out then.
It would, if it were true. It isn't.
Except it is.
Post by Dag-Erling Smørgrav
apt-get --build source package_name
That doesn't update from sources, that just builds a package. You're
still stuck updating from packages.

Further, like the rpm command, this doesn't deal with dependencies,
other than to complain if they aren't met. This means that using it to
deal with sources is about as pleasant as using rpm to install binary
packages. Further, there doesn't appear to be anything like make.conf
to make it easy to tailor the build process to meet the users
requirements.

<mike
--
Mike Meyer <***@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
Tom Evans
2007-05-15 15:39:01 UTC
Permalink
y***@u.washington.edu
2007-05-15 17:45:22 UTC
Permalink
Post by Mike Meyer
Post by Dag-Erling Smørgrav
[Linux package systems]
Post by Mike Meyer
As far as I know, none of them handle updates from source at all. In
fact, dealing with sources seems to be a noticable weakness for them.
This pretty much rules them out then.
It would, if it were true. It isn't.
Except it is.
Post by Dag-Erling Smørgrav
apt-get --build source package_name
That doesn't update from sources, that just builds a package. You're
still stuck updating from packages.
Further, like the rpm command, this doesn't deal with dependencies,
other than to complain if they aren't met. This means that using it to
deal with sources is about as pleasant as using rpm to install binary
packages. Further, there doesn't appear to be anything like make.conf
to make it easy to tailor the build process to meet the users
requirements.
<mike
Of course Gentoo does do this [updating from source], being as it is a
rip-off of freebsd ports. I haven't used it since the (fairly) early
days when portage was written as a series of bash scripts. I'm fairly
sure they must have improved it since then - it made portupgrade look
positively snappy. Unsurprisingly, everything was/is controlled by
adding options (mainly USE_FLAGS - eg '+gtk2 -kde') to make.conf.
Tom
Tom,
It's gotten excruciatingly more complex with the introduction of Python, classes, and an increase in USE flags.
-Garrett

Duane Whitty
2007-05-12 02:29:24 UTC
Permalink
Post by David Naylor
Hi,
Thank you all for your responses, it has given me much to think about. I
guess there is consenses that there is room for improvement in the current
pkg system. Attached are some of my initial ideas about what is required and
expected in any (and all future) package systems.
Since I am going away this weekend I have had limited time to work on this
document and as such it is in early stages of development. My objective is
to get a clear understanding and target of what is required for this new
package system.
1) Backwards-compatibility with the ports collection is absolutely
required. This may not be an issue for you, but some past proposals
have included the phrase "rewrite the ports collection to use tool X".
This is pretty clearly a non-starter, unless you also provide a
workable (i.e. mostly automated) migration strategy.
2) Integration with ongoing work is required. e.g. we have two people
working on extending the existing pkg_tools as part of the google
summer of code (including one who is working on integrating the
metadata into a berkeley DB database, to attempt to solve the
scalability problems we are starting to run into as the typical number
of installed packages on a user system grows), and we're not going to
throw away their work.
3) Dependencies on new code have a high bar for adoption. i.e. if you
propose to bring in new software packages into the base system, you
need to definitively prove that they are necessary for solving a
serious problem.
4) When people hear the phrase "new package system" they take this as
an invitation to pile on feature requests, pet peeves, etc that would
be "great to have" in a new package system. While it helps to be
aware of these ideas, and where appropriate to avoid designing a
system that prevents them from being added, the temptation is to
undergo feature creep: the proposal expands to engulf all possible
features and ends up collapsing under its own weight (also known as
"Second System Syndrome"). Stay focused on a core idea you want to
achieve, and you might avoid this problem which has killed the last N
serious "new package system" projects.
I think your current proposal falls short on points 2) and 3). In
particular, I don't see where SQLite is necessary to solve any
problems we are currently facing, and your proposal conflicts with
existing work.
Kris
Kris,

In your opinion what is the biggest problem(s) the ports system and
the package building system currently face? Is this a common problem,
i.e., is the issue facing building from ports the same as installing
from pre-built packages? I ask this in the context of infrastructure
as opposed to any tools currently being used.

Is it hoped / planned that storing the metadata in a berkeley DB
database will help with the parallelization of package building?

If only one thing was going to be done to improve the ports system,
not including drafting more volunteers :) , what would you recommend
that one thing be?

Duane
Kris Kennaway
2007-05-12 04:43:01 UTC
Permalink
Ivan Voras
2007-05-12 21:13:34 UTC
Permalink
'Michel Talon'
2007-05-14 08:25:12 UTC
Permalink
converted INDEX
into postgresSQL because I was playing around with making a message queue
based approach -
and it becomes BIG - The only table structure difference from the current
format was that I
was able to track "who is depending on" a port - which I am pretty sure
could be handled in the
current framework - e.g. we could add a file having the depending port names
or so
niobe% cp /usr/ports/INDEX-6 .
niobe% sqlite3 index.db
sqlite> CREATE TABLE index6 (
pkgname varchar(1),
path varchar(1),
prefix varchar(1),
comment varchar(1),
descr varchar(1),
maintainer varchar(1),
categories varchar(1),
build_deps varchar(1),
run_deps varchar(1),
website varchar(1),
extract_deps varchar(1),
patch_deps varchar(1),
fetch_deps varchar(1));
sqlite> .import INDEX-6 index6
... completes in less than 2 seconds
sqlite> select * from index6 where path = "/usr/ports/accessibility/atk";
atk-1.12.4|/usr/ports/accessibility/atk|/usr/local|A GNOME accessibility
toolkit
(ATK)|/usr/ports/accessibility/atk/pkg-descr|***@FreeBSD.org|accessibility
devel|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2 libtool-1.5.22_3
perl-5.8.8 pkg-config-0.21|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2
perl-5.8.8
pkg-config-0.21|http://developer.gnome.org/projects/gap/||libtool-1.5.22_3|

niobe% ls -lh INDEX-6 index.db
-rw-r--r-- 1 michel lpthe 9,5M 14 mai 10:00 INDEX-6
-rw-r--r-- 1 michel lpthe 12M 14 mai 10:12 index.db

Where is this huge increase in size?
Admittedly, i have not created indexes, etc.
Compare this to the portsdb created by portupgrade from the same INDEX-6

niobe% ls -lh /usr/ports/INDEX-6.db
-rw-r--r-- 1 root wheel 21M 16 fév 13:36 /usr/ports/INDEX-6.db

Surprise, surprise, the BerkeleyDB suddenly appears less glorious.
--
Michel TALON
Thomas Sparrevohn
2007-05-14 11:33:24 UTC
Permalink
Post by 'Michel Talon'
converted INDEX
into postgresSQL because I was playing around with making a message queue
based approach -
and it becomes BIG - The only table structure difference from the current
format was that I
was able to track "who is depending on" a port - which I am pretty sure
could be handled in the
current framework - e.g. we could add a file having the depending port names
or so
niobe% cp /usr/ports/INDEX-6 .
niobe% sqlite3 index.db
sqlite> CREATE TABLE index6 (
pkgname varchar(1),
path varchar(1),
prefix varchar(1),
comment varchar(1),
descr varchar(1),
maintainer varchar(1),
categories varchar(1),
build_deps varchar(1),
run_deps varchar(1),
website varchar(1),
extract_deps varchar(1),
patch_deps varchar(1),
fetch_deps varchar(1));
sqlite> .import INDEX-6 index6
... completes in less than 2 seconds
sqlite> select * from index6 where path = "/usr/ports/accessibility/atk";
atk-1.12.4|/usr/ports/accessibility/atk|/usr/local|A GNOME accessibility
toolkit
devel|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2 libtool-1.5.22_3
perl-5.8.8 pkg-config-0.21|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2
perl-5.8.8
pkg-config-0.21|http://developer.gnome.org/projects/gap/||libtool-1.5.22_3|
niobe% ls -lh INDEX-6 index.db
-rw-r--r-- 1 michel lpthe 9,5M 14 mai 10:00 INDEX-6
-rw-r--r-- 1 michel lpthe 12M 14 mai 10:12 index.db
Where is this huge increase in size?
Admittedly, i have not created indexes, etc.
Compare this to the portsdb created by portupgrade from the same INDEX-6
niobe% ls -lh /usr/ports/INDEX-6.db
-rw-r--r-- 1 root wheel 21M 16 fév 13:36 /usr/ports/INDEX-6.db
Surprise, surprise, the BerkeleyDB suddenly appears less glorious.
That you table structure does not even full fill 1st normal form ;-) - You need to convert that
into independent tables in order to get it on a reasonable normal form format
Dag-Erling Smørgrav
2007-05-14 12:01:37 UTC
Permalink
Post by Thomas Sparrevohn
That you table structure does not even full fill 1st normal form ;-) -
You need to convert that into independent tables in order to get it on
a reasonable normal form format
Yes, the dependency columns violate 1NF, but it's still pretty
impressive - especially the fact that sqlite imported the index file
directly without any form of preprocessing.

DES
--
Dag-Erling Smørgrav - ***@des.no
Tom Evans
2007-05-14 16:48:38 UTC
Permalink
Rick C. Petty
2007-05-14 21:52:47 UTC
Permalink
Post by 'Michel Talon'
niobe% sqlite3 index.db
sqlite> CREATE TABLE index6 (
pkgname varchar(1),
path varchar(1),
prefix varchar(1),
comment varchar(1),
descr varchar(1),
maintainer varchar(1),
categories varchar(1),
build_deps varchar(1),
run_deps varchar(1),
website varchar(1),
extract_deps varchar(1),
patch_deps varchar(1),
fetch_deps varchar(1));
sqlite> .import INDEX-6 index6
... completes in less than 2 seconds
sqlite> select * from index6 where path = "/usr/ports/accessibility/atk";
atk-1.12.4|/usr/ports/accessibility/atk|/usr/local|A GNOME accessibility
toolkit
devel|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2 libtool-1.5.22_3
perl-5.8.8 pkg-config-0.21|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2
perl-5.8.8
pkg-config-0.21|http://developer.gnome.org/projects/gap/||libtool-1.5.22_3|
What this shows me is that sqlite doesn't follow SQL92 standards.
According to the section 6.1 of the standard[1]:

Syntax rule #1 states "VARCHAR is equivalent to CHARACTER VARYING."
Syntax rule #9b states
"If VARYING is specified in <character string type>, then the
length in characters of the character string is variable,
with a minimum length of 0 and a maximum length of the value
of <length>."

So your example should have failed to work correctly. You should have used
something more appropriate, like VARCHAR(255) instead of VARCHAR(1).

If SQLite isn't even standards-compliant, why is anyone considering it? =)
Nitpicky, I know, but it makes me wonder what else they don't follow...

-- Rick C. Petty

[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Kris Kennaway
2007-05-14 22:06:37 UTC
Permalink
Post by Rick C. Petty
Post by 'Michel Talon'
niobe% sqlite3 index.db
sqlite> CREATE TABLE index6 (
pkgname varchar(1),
path varchar(1),
prefix varchar(1),
comment varchar(1),
descr varchar(1),
maintainer varchar(1),
categories varchar(1),
build_deps varchar(1),
run_deps varchar(1),
website varchar(1),
extract_deps varchar(1),
patch_deps varchar(1),
fetch_deps varchar(1));
sqlite> .import INDEX-6 index6
... completes in less than 2 seconds
sqlite> select * from index6 where path = "/usr/ports/accessibility/atk";
atk-1.12.4|/usr/ports/accessibility/atk|/usr/local|A GNOME accessibility
toolkit
devel|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2 libtool-1.5.22_3
perl-5.8.8 pkg-config-0.21|gettext-0.14.5_2 glib-2.12.9 libiconv-1.9.2_2
perl-5.8.8
pkg-config-0.21|http://developer.gnome.org/projects/gap/||libtool-1.5.22_3|
What this shows me is that sqlite doesn't follow SQL92 standards.
Syntax rule #1 states "VARCHAR is equivalent to CHARACTER VARYING."
Syntax rule #9b states
"If VARYING is specified in <character string type>, then the
length in characters of the character string is variable,
with a minimum length of 0 and a maximum length of the value
of <length>."
So your example should have failed to work correctly. You should have used
something more appropriate, like VARCHAR(255) instead of VARCHAR(1).
Some of the fields can (and do) have unbounded length.

Kris
Rick C. Petty
2007-05-14 22:26:45 UTC
Permalink
Post by Kris Kennaway
Some of the fields can (and do) have unbounded length.
Kris
Where is that specified in the SQL spec? Or are you just saying that
SQLite provides this flexibility?

-- Rick C. Petty
Kris Kennaway
2007-05-14 22:45:28 UTC
Permalink
Post by Rick C. Petty
Post by Kris Kennaway
Some of the fields can (and do) have unbounded length.
Kris
Where is that specified in the SQL spec? Or are you just saying that
SQLite provides this flexibility?
I am saying that some of the fields in INDEX have unbounded length, so
you'd better be prepared to handle it.

Kris
'Michel Talon'
2007-05-14 19:39:26 UTC
Permalink
Post by 'Michel Talon'
Where is this huge increase in size?
Admittedly, i have not created indexes, etc.
^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by 'Michel Talon'
Compare this to the portsdb created by portupgrade from the same INDEX-6
niobe% ls -lh /usr/ports/INDEX-6.db
-rw-r--r-- 1 root wheel 21M 16 fév 13:36 /usr/ports/INDEX-6.db
Surprise, surprise, the BerkeleyDB suddenly appears less glorious.
Your index has no indices, and you wonder why it is smaller?
I am really tired answering questions about straw man,
misrepresentations of my position, and so on. I don't advocate using
XML, nor java, nor java tools nor anything of this sort. I am only
claiming that SQLite does a better job than a BerkeleyDB for the
precise mission that it seems the BerkeleyDB is programmed in the SOC.

As to the question of indices i am the one who pointed out there are no
indices, so there is little merit inventing this new objection.
Moreover the objection is completely bogus, because SQLite creates the
index in memory. After the following command
sqlite> create index path_ind on index6(path);
(i have created an index on origins, the only question of interest)
the size of the database doesn't change at all! It remains twice smaller
than the BerkeleyDB. But the memory consumption augments. here are the
values before and after creation of index:

USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
michel 1174 0,0 0,2 2984 2324 p2 I+ 21:03 0:00,01 sqlite3
michel 1174 0,0 0,6 6728 6068 p2 S+ 21:03 0:00,34 sqlite3

By the way, i don't concern myself with the problem of accelerating
package registration or similar stuff. Indeed most of these problems
can easily be solved by some optimizations in the makefile and
parallelism.

The true problem, as Kris said, is the problem of upgrading an
installation in a reasonable time -that is, not several days- and with
*total* reliability. None of present FreeBSD tools do that, while it is
common place with Debian. This is an extremely difficult problem
in the FreeBSD context as des said, and i am certainly not able to solve
it. But at least i have researched the question and written some code.
It is perfectly obvious that most of the bikesheds in this thread are
due to perfect ignorance of the subject, which could be remedied
reading:
http://www.lpthe.jussieu.fr/~talon/freebsdports.html
and the small SQLite documentation at:
http://www.sqlite.org/lang.html
--
Michel TALON
Loading...